Guidance and results for the data-rich, yet information-poor.

DATASometimes you are in a position where you have a problem you need to solve, but you do not necessarily have all of the historic data you want or need to do so. You don’t have to allow this scenario to stop you in your tracks. There are always ways to back into a problem.

A large academic institution recently asked TMA to create a predictive modeling surveillance program to detect credit card fraud. The challenge here was that there were no known historical cases of fraud to work from. How, then, to train this model?

TMA solved this problem by using an unsupervised learning approach. The idea was to cluster behaviors based on both distance mapping and multidimensional space, along with pattern matching.

After building the cluster, TMA worked with the users to define the number of clusters that they could work with.

The model was then prepared to stand by for any known cases of fraud to come through the system. The users could then see which segment or cluster the behavior mapped to. They would then target any auditing efforts on that cluster. The model was designed to grow more effective as more cases came in.

Cluster analysis has many uses. It is the same sort of process that Netflix uses to predict your next movie. It’s also the same process Amazon uses to predict your next purchase.

It is, of course, just one tool out of many. If you want to learn how to match the right tool to the right task you need to seek the appropriate predictive analytics training program. Why not start with TMA’s free webinar? Or sign up for training courses today to learn how to solve sticky problems just like this one.

WatchHere’s a question that came up during a recent TMA Q&A. It’s actually a very good question, as it is important to count the time commitment that investments will demand from you, just as it is important to count the initial investment cost.

The answer? It depends.

With the help of modern software you can certainly crank out a quick and dirty model in a matter of hours. This approach carries many strategic risks, but it can be done.

Going through the entire TMA process takes about six weeks on average. As for receiving results from following that process?

Well, that all depends on what you’re doing.

For example, if you’re using predictive analytics to improve the efficacy of your e-mail marketing campaigns then you’d typically see results very quickly. Someone who is using predictive analytics to improve the efficacy of a direct mail campaign might have to wait longer.

It also depends on whether or not you are putting together a single project or are attempting to methodically build an internal analytics practice. The difference here in time is the difference between building a single car and building an entire care factory.

Of course, there are plenty of benefits associated with the longer, harder project that you’ll never see if you remain fixated on building those individual cars!

If you’re not sure how and where to get started on your next project (big or small), why not register for TMA’s next free webinar? You’ll receive a thorough grounding in the concepts and strategies that help a predictive analytics project become successful, and you’ll get your own chance to ask your questions of gurus Tony and Scott after the webinar is complete. Sign up today.

Messy DataWhen you’re working with predictive modeling you’re typically going to run into two different types of data sets.

Structured data sets are the easiest to deal with. Your financial data may be mostly structured, for example. You have a fixed dollar amount which came in during a specific period of time. The numbers always mean the same thing.

“Unstructured” data contains important information, but it might not live in any numeric form. You may have thousands of customer service calls or records to sort through. You may be tracking social media content, like the subject matter of your customer’s tweets. When that data is used, it will have to be prepared and converted into a specific number before it will be useful to you.

Most modern data environments are a very messy mix of structured and unstructured data, and usually that data is stored in huge quantities. It’s easy to get overwhelmed when you look at the sheer scope of the data that is available to you.

Fortunately, the problem is not insurmountable if you approach your data in the right way.

First and foremost, you must have a clear idea of what you are trying to accomplish with this data. What are your objectives? Once you know your objectives it’s easier to understand which data won’t make sense for this particular problem set.

Next, you’ll need to take a sample. You’re never going to analyze all of the data that’s available to you. You only need to put together a good enough representation of the whole solution space.

In fact, one of the biggest mistakes most modelers make is that they’re prone to going into the data assuming that “more” is better. In truth, adding more and more data is actually a good way to ensure the failure of your model.

Fortunately, knowing your targets makes it far easier to decide what you’re going to include. You’ll dive in, pick the data that is relevant to your objective, and then work with that data (and only that data).

Don’t jump into the data without setting your course. That’s like starting a book on Chapter 5. Both actions all but ensure you won’t have a clue what’s going on–which means you’re likely to create an even bigger mess than the one you started with.

photodune-1381176-analizing-data--xs (2)As you may know, data preparation is usually the most labor-and-time intensive part of a predictive analytics project. What you may not recognize is that the entire preparation phase needs to be documented on an on-going basis as you complete this phase of the project.

It may not seem important as you’re doing it. But if your model ever needs any revisions you’re going to need to know what you did–and you’re probably not going to be very accurate if you simply try to pull that information out of your own memory banks.

You may also have to use your model again at a later date, with a new set of data. That model simply won’t give you consistent results if you don’t prepare the new data set in the same way you went about preparing the original data set. Remember you may be translating unstructured data (such as the content of recorded customer service calls) into numbers so that your model can actually read what happened. If you use different numbers the second time around the model’s going to produce very different results.

This documentation process should be completed through all six phases of the model’s development.

Fortunately, TMA has created a way to make your data prep documentation much easier. TMA offers an Excel Workbook template which makes it very easy to document and capture your progress on nearly any kind of predictive analytics project. You will receive this spreadsheet with your other course materials when you sign up for one of TMA’s training classes.

Documentation helps you deal with any data preparation issues on an on-going basis. If you don’t use TMA’s process it’s still very important to develop your own method. Otherwise, you’ll be starting from scratch every time there’s any question about the model or models that you’ve been developing.

BriefcasePredictive analytics offers one outstanding strength: it helps to eliminate the inconsistencies in human behavior. It even helps you eliminate your own inconsistencies. You can start making decisions without any fear of unconscious bias.

For example, many employers are biased against hiring employees who have significant gaps in their resume. This is harming a huge portion of the population who lost their jobs, often through no fault of their own, at the beginning of the Great Recession. Predictive analytics has now shown that this attitude is actually costing companies money.

Carl Tsukahara of Evolv writes:

It’s now recognized that the use of predictive analytics can surface powerful conclusions from disparate data sources that can, in turn, serve as the catalyst to foster change in business culture, improve hiring and management practices, and enable more Americans to find gainful employment in fulfilling roles. In my own experience at Evolv, just one of the harmful hiring biases we’ve used predictive analytics to debunk is that “People who haven’t worked recently aren’t viable candidates.” Our technology platform looked across millions of data points on employees across our customer network to prove that the long term unemployed perform no worse than those without an extended jobless spell and have empowered our clients (including several of the companies that supported this week’s legislation) to hire those candidates using a predictive score based on this same technology. We hope this finding in particular helps that 32 percent get that interview, that call back – that chance to show employers that they too, can be great additions to a team.

Some 300 companies are making significant changes in response to these findings, and they are seeing a benefit. According to the above-referenced article, Xerox was one of these companies. Changing their policies in response to these findings resulted in a 20% reduction in their attrition rate, which in turn saved the company a great deal of money.

So what’s the takeaway for you, and your business? Well, you’ve already gotten the benefit of someone else’s predictive analytics project today: you know that you can hire the long-term jobless without creating any problems for your business. You might even solve some.

But you can also get a great deal more out of predictive analytics just by recognizing that you can use predictive analytics to challenge your own assumptions. You could be making other assumptions about hiring, about employee benefits, about vendors, or about any other aspect of your business. You may think those assumptions are bringing great results…but does the data support your claims?

Asking yourself these questions takes your business intelligence program to the next level. You begin by diving into the problems you know about. You know you want more sales, so you start there. Challenging your assumptions, however, will grant you a path to the problems you just aren’t aware of yet–problems that are costing you time, costing you money, and costing you your next star employee.