Guidance and results for the data-rich, yet information-poor.
Looking for a data scientist is a little like going on a unicorn hunt.

Looking for a data scientist is a little like going on a unicorn hunt.

Are you trying to hire a “data scientist” for your organization? You might want to think twice before you decide to place that job ad.

“Data scientist” is either a meaningless designation or a descriptor for a person who will prevent your organization from realizing the full value of the data that it currently owns. Here’s why.

Data scientists typically approach the problem from the wrong direction.

Most “data scientists” typically focus on building technically superior models. There’s nothing wrong with building a better rocket ship, but first you’d best make sure that the rocket is actually pointed in the right direction.

No “optimized model” has ever aligned with business objectives. No business has ever generated a significant benefit from merely building a better algorithm.

In fact, many so-called “data scientists” pooh-pooh strategic assessment and project planning as “fluff” that distracts them from the “real work” of writing ever-more complicated code.

Unfortunately, strategic assessment and project planning happen to be vital if you’re ever going to extract any value from your data.

The term (as most organizations use it) describes a unicorn.

It is impossible–or at least, exceedingly rare–to find all of the skill-sets of a so called “data scientist” as most companies envision the position within a single human being. When organizations talk about “data scientists” they typically mean someone who:

  • Has a collection of advanced analytical skills
  • Has vast IT experience
  • Has and effectively uses a broad range of managerial soft skills
  • Can oversee analytic processes at the project level.

This mythical human somehow has managerial acumen and technical skill all rolled up into one brilliant, convenient package. Someone like this might exist in the sense that anything is possible…kind of like the way unicorns might exist in a universe where anything is possible. Since most organizations don’t have the time or money to embark on a unicorn hunt it’s smarter to take a step back and to think about who or what can actually achieve what the organization hopes to achieve by hiring a “data scientist” in the first place.

Anyone can call themselves a data scientist.

Granted, if you are dead set on acquiring a certain skill set then it’s awfully hard to fake having the technical skills. However, there is simply no formal definition for the term, which means no certifications, no degrees, and no quality controls. An unemployed MBA can legally hang out his “data scientist” shingle tomorrow. Often, amateurs do just that, to the detriment of the organizations they attempt to help.

Your existing employees can probably give you what you need.

Believe it or not, your existing employees probably have what it takes to help you derive outstanding value from the data you possess. In fact, training strategic thinkers who are close to the problems your organization is facing is often the first step. Sending key employees to a vendor-neutral training regimen which takes just a few short weeks can help you  begin transforming your data into actionable intelligence that offers solid benefits to your business. Doesn’t that sound far better than hiring an overpriced theoretical analytic specialist who is largely incapable of taking your organization where it needs to go?

 

DATASometimes you are in a position where you have a problem you need to solve, but you do not necessarily have all of the historic data you want or need to do so. You don’t have to allow this scenario to stop you in your tracks. There are always ways to back into a problem.

A large academic institution recently asked TMA to create a predictive modeling surveillance program to detect credit card fraud. The challenge here was that there were no known historical cases of fraud to work from. How, then, to train this model?

TMA solved this problem by using an unsupervised learning approach. The idea was to cluster behaviors based on both distance mapping and multidimensional space, along with pattern matching.

After building the cluster, TMA worked with the users to define the number of clusters that they could work with.

The model was then prepared to stand by for any known cases of fraud to come through the system. The users could then see which segment or cluster the behavior mapped to. They would then target any auditing efforts on that cluster. The model was designed to grow more effective as more cases came in.

Cluster analysis has many uses. It is the same sort of process that Netflix uses to predict your next movie. It’s also the same process Amazon uses to predict your next purchase.

It is, of course, just one tool out of many. If you want to learn how to match the right tool to the right task you need to seek the appropriate predictive analytics training program. Why not start with TMA’s free webinar? Or sign up for training courses today to learn how to solve sticky problems just like this one.

WatchHere’s a question that came up during a recent TMA Q&A. It’s actually a very good question, as it is important to count the time commitment that investments will demand from you, just as it is important to count the initial investment cost.

The answer? It depends.

With the help of modern software you can certainly crank out a quick and dirty model in a matter of hours. This approach carries many strategic risks, but it can be done.

Going through the entire TMA process takes about six weeks on average. As for receiving results from following that process?

Well, that all depends on what you’re doing.

For example, if you’re using predictive analytics to improve the efficacy of your e-mail marketing campaigns then you’d typically see results very quickly. Someone who is using predictive analytics to improve the efficacy of a direct mail campaign might have to wait longer.

It also depends on whether or not you are putting together a single project or are attempting to methodically build an internal analytics practice. The difference here in time is the difference between building a single car and building an entire care factory.

Of course, there are plenty of benefits associated with the longer, harder project that you’ll never see if you remain fixated on building those individual cars!

If you’re not sure how and where to get started on your next project (big or small), why not register for TMA’s next free webinar? You’ll receive a thorough grounding in the concepts and strategies that help a predictive analytics project become successful, and you’ll get your own chance to ask your questions of gurus Tony and Scott after the webinar is complete. Sign up today.

Messy DataWhen you’re working with predictive modeling you’re typically going to run into two different types of data sets.

Structured data sets are the easiest to deal with. Your financial data may be mostly structured, for example. You have a fixed dollar amount which came in during a specific period of time. The numbers always mean the same thing.

“Unstructured” data contains important information, but it might not live in any numeric form. You may have thousands of customer service calls or records to sort through. You may be tracking social media content, like the subject matter of your customer’s tweets. When that data is used, it will have to be prepared and converted into a specific number before it will be useful to you.

Most modern data environments are a very messy mix of structured and unstructured data, and usually that data is stored in huge quantities. It’s easy to get overwhelmed when you look at the sheer scope of the data that is available to you.

Fortunately, the problem is not insurmountable if you approach your data in the right way.

First and foremost, you must have a clear idea of what you are trying to accomplish with this data. What are your objectives? Once you know your objectives it’s easier to understand which data won’t make sense for this particular problem set.

Next, you’ll need to take a sample. You’re never going to analyze all of the data that’s available to you. You only need to put together a good enough representation of the whole solution space.

In fact, one of the biggest mistakes most modelers make is that they’re prone to going into the data assuming that “more” is better. In truth, adding more and more data is actually a good way to ensure the failure of your model.

Fortunately, knowing your targets makes it far easier to decide what you’re going to include. You’ll dive in, pick the data that is relevant to your objective, and then work with that data (and only that data).

Don’t jump into the data without setting your course. That’s like starting a book on Chapter 5. Both actions all but ensure you won’t have a clue what’s going on–which means you’re likely to create an even bigger mess than the one you started with.

photodune-1381176-analizing-data--xs (2)As you may know, data preparation is usually the most labor-and-time intensive part of a predictive analytics project. What you may not recognize is that the entire preparation phase needs to be documented on an on-going basis as you complete this phase of the project.

It may not seem important as you’re doing it. But if your model ever needs any revisions you’re going to need to know what you did–and you’re probably not going to be very accurate if you simply try to pull that information out of your own memory banks.

You may also have to use your model again at a later date, with a new set of data. That model simply won’t give you consistent results if you don’t prepare the new data set in the same way you went about preparing the original data set. Remember you may be translating unstructured data (such as the content of recorded customer service calls) into numbers so that your model can actually read what happened. If you use different numbers the second time around the model’s going to produce very different results.

This documentation process should be completed through all six phases of the model’s development.

Fortunately, TMA has created a way to make your data prep documentation much easier. TMA offers an Excel Workbook template which makes it very easy to document and capture your progress on nearly any kind of predictive analytics project. You will receive this spreadsheet with your other course materials when you sign up for one of TMA’s training classes.

Documentation helps you deal with any data preparation issues on an on-going basis. If you don’t use TMA’s process it’s still very important to develop your own method. Otherwise, you’ll be starting from scratch every time there’s any question about the model or models that you’ve been developing.