Guidance and results for the data rich, yet information poor.
It's going to take more than a mess of information to solve real business problems.

It’s going to take more than a mess of information to solve real business problems.

Everybody’s excited about “big data.” Organizations all over the world are leaping on board the Big Data Bandwagon, and are eager to see what it can do.

There’s just one problem. “Big data” isn’t exactly what you think it is. There are a lot of problems with the term, and those problems lead to misconceptions that keep you from making the most of your organization’s data.

“Big Data” is a marketing term.

There’s nothing new about predictive analytics. The science has been around for decades.

Big data has just been the particularly successful marketing term that has catapulted predictive analytics into the consciousness of the mainstream. But sooner or later, “big data” will start to lose steam. What will happen then?

Marketers will still need to sell predictive analytics software. So they’ll make up another term. The new term might be just as successful or it might fall flat, but it will still be a new name for the same science.

You need to know this, because it will keep you from making silly decisions. You will not, for example, leap to buy the most expensive “big data” software package on the market when you know that people have been performing many of the same functions in boring old Microsoft Excel for years and years.

“Big Data” assumes that bigger is better.

In truth, bigger isn’t always better, and “big data” can often mean “having more data than you know what to do with.” Many organizations need to master small data before they start messing around with big data…and many of your insights are going to come from sample sizes that represent only a small portion of the data that your organization has been collecting.

Stop worrying so much about big. It’s more important to shift your mindset about data. Gathering more and more data can’t help you until you’ve started asking the right questions. Namely: what problem are we attempting to solve by delving into this data? Everything else has to flow from that mindset.

Don’t ignore your data. But don’t romanticize it, either.

Yes, your data does have a lot to tell you. TMA wouldn’t be here if it didn’t.

You just can’t afford to get swept up into the hype. Data is just a tool, a tool that you will hopefully use to discover solutions to some of the problems that your organization is facing.

Data, and what you can do with it, is not magic. It is math. And when you recognize that, you can approach it as a mathematician would, which means ignoring all the hype. Instead, get laser focused on creating a data analytics program that will be truly useful to you.

DataPreparationDo you know the difference between a great modeler and a mediocre modeler? Many people assume that modeling greatness is wrapped up in the ability to build a better algorithm.

Building a better algorithm is like building a faster rocket ship. It’s great, as long as you’re pointing the ship in the right direction. Nobody wants to move faster if they’re going in the wrong direction, but this is precisely what happens in many organizations. Data preparation is a big part of keeping that rocket pointed in the right direction.

There are two issues which you might encounter during your data preparation phase.

The issue of low-quality data.

The professionals at TMA are often asked whether predictive analytics is possible with “low quality” data. The short answer is — yes, of course it is. The data is what it is, and a modeler can rarely wait for higher quality data to present itself. Instead, the modeler will have to engage with the data, cleaning it up so that it may be used.

There is, of course, a caveat that you should be aware of. For example, you never want to clean your data to the point where you can’t develop it in a live environment. You should always do your modeling in the environment where the data is expected to perform.

The issue of low-quantity data.

TMA professionals also hear a lot of concerns about the quantity of data that any given organization might possess. There is always a fear that there just won’t be enough data to complete meaningful projects.

Yes, you do need sufficient data to complete your project.

No, this isn’t often a real problem for the modern organization. The typical organization will have far more data than is necessary to complete most projects.

The modern organization usually has more data than it can handle.

It’s all about getting to know your data.

If either of these issues are poised to become a problem you will learn about it during the data prep phase. This phase is all about getting to know the data and its limitations so that the data may be applied to the problem at hand. You can’t skip this step–you must understand what your data can do.

This is yet another reason why you can’t just dump data into a software program or an algorithm if you expect to get good results.

But it’s usually possible to solve the problems inherent in the data. You shouldn’t let the state of the data stop you. You should just accept it as part of the process.

All data is dirty. It’s up to you, as the analyst, to improve it.

presenting data analyticsThe primary purpose of data mining and prescriptive analytics is the creation of actionable insights. But data analytics can’t provide those insights if nobody understands what it says.

That’s why it’s important to consider the best way to present your data analytics. It’s hardly a frivolous question.

However, there is no such thing as “one true data analytics presentation method.”

Everybody processes information differently.

Some people are very talented with numbers, and so want all of their facts lined up in neat, orderly tables. Others are more visual, and need graphs, charts, or other visualizations to really understand what’s going on.

There are two ways to make sure that you are presenting your data in the best possible way.

Method #1: Ask the Audience

One method is to ask the decision makers to whom you will be presenting your results. After all, they will be the ones who need to choose a direction based on the insights you are bringing to them.

Your data mining projects will be proportionately successful to your success at speaking to the organization’s leadership. If management is intimidated or overwhelmed by the information that you are giving to them then they are not going to act on it, which means that you won’t be able to produce the kinds of results you were hoping to achieve.

Method #2: Use Multiple Formats

You won’t always have the luxury of asking your audience what they want. You may also need to present your finding sto multiple stakeholders, all of whom process the information a little differently.

So it may be in your best interest to choose multiple methods for presenting your data. That way, you’ll increase your chances of helping every stakeholder understand what you are trying to get across to them.

It’s all about soft skills.

TMA students are often surprised to hear that “soft skills” are vital to the success of their predictive analytics projects. Many people come to class believing that data mining is primarily about the math.

Nothing could be further from the truth. Navigating organizational politics is vital if you actually want data mining to help your organization.

Want to find out why? Don’t forget to sign up for TMA’s next free webinar: Data Mining, Failure to Launch! 

PresentationThe primary purpose of data mining and predictive analytics is the creation of actionable insights. But data can’t provide those insights if nobody understands what it says.

That’s why it’s important to consider the best way to present your data. It’s hardly a frivolous question.

However, there is no such thing as “one true data presentation method.”

Everybody processes information differently.

Some people are very talented with numbers, and so want all of their facts lined up in neat, orderly tables. Others are more visual, and need graphs, charts, or other visualizations to really understand what’s going on.

There are two ways to make sure that you are presenting your data in the best possible way.

Method #1: Ask the Audience

One method is to ask the decision makers to whom you will be presenting your results. After all, they will be the ones who need to choose a direction based on the insights you are bringing to them.

Your data mining projects will be proportionately successful to your success at speaking to the organization’s leadership. If management is intimidated or overwhelmed by the information that you are giving to them then they are not going to act on it, which means that you won’t be able to produce the kinds of results you were hoping to achieve.

Method #2: Use Multiple Formats

You won’t always have the luxury of asking your audience what they want. You may also need to present your finding sto multiple stakeholders, all of whom process the information a little differently.

So it may be in your best interest to choose multiple methods for presenting your data. That way, you’ll increase your chances of helping every stakeholder understand what you are trying to get across to them.

It’s all about soft skills.

TMA students are often surprised to hear that “soft skills” are vital to the success of their predictive analytics projects. Many people come to class believing that data mining is primarily about the math.

Nothing could be further from the truth. Navigating organizational politics is vital if you actually want data mining to help your organization.

Want to find out why? Don’t forget to sign up for TMA’s next free webinar: Data Mining, Failure to Launch! 

A Turtle with Strategic Soft Skills Will Beat a
Tactical Data Scientist Rabbit Every Time

by Eric King, President, The Modeling Agency

Sure, It’s Great to Run Fast

I recently read a great article in Software Advices’s  Plotting Success blog authored by Victoria Garment entitled  “3 Ways to Test the Accuracy of Your Predictive Models”.  In the post, Garment features the expertise of three legends in the predictive analytics industry: Dr. John Elder, Dr. Karl Rexer and Dean Abbott.  I know each of them directly and they possess a vast amount of practical experience.  They each shared highly valuable tips on how to test and validate the accuracy of your predictive models so that they hold up well on deployment.  I highly recommend the article and advice to modeling practitioners at any level to atma2void a myriad of tactical pitfalls that cause their models to fall short of their potential or give misleading results.

In this post, I wish to offer up a strategic consideration to complement Garment’s post and ensure that the great efforts to arrive at high-performance validated models do not disintegrate at the project level – which is where the real organizational impact occurs.  In our practice, we see far too often that analytic practitioners strive to build superior models – only to find that they’re not well-aligned with organizational objectives, not well adapted to the overall environment, not ultimately adopted, or not understood by leadership.

But First, Know Where the Finish Line Is

It almost sounds trite, but it is incredibly rare in my company’s experience that analytics teams in large organizations make the effort at the start of an analytic project to undertake a comprehensive assessment and develop a resulting targeted project plan.  They start with data and software, which is like starting a novel on chapter 9.

If leadership’s motivation for the analytics are not known; if users are not on board; if an analytic sandbox wasn’t prepared and IT won’t tolerate multiple pull requests; if significant retrofitting is required after deployment – then even the a model with a validated low error rate just won’t survive.  It’s akin to setting a fast rabbit 45 degrees off course.  Even a turtle with a mediocre model will outperform the rabbit if that model can communicate moderate lift in terms that leadership can apply.

Take the Time to Sharpen Your Axe

The current environment of fast pace and immediate demands is actually costing organizations far more than they realize.  Abraham Lincoln said “Give me six hours to chop down a tree and I will send the first four hours sharpening the axe.”  If analytic teams would follow the guidance of public domain modeling process guides that start with the planning exercise of business understanding and data understanding, they would arrive at analytics that are truly measurable, actionable and sustainable.

Instead, most organizations continue to build individual custom cars.  If they took a little extra time to design an analytic factory that churns out well-fitted models nearly at the speed of business demand, imagine how agile and effective that would be.

The Best Solution Combines Sound Tactics and Strategy

In my experience with analytics, a turtle running on a prioritized opportunity and sound project plan will win the race over a technically accurate model that’s misapplied.  However, with modern software and strategic training, there’s no reason not to have both.

The Modeling Agency’s vendor-neutral courseware establishes a purposeful blend of tactical and strategic predictive analytics training.  In fact, the courseware places slightly more emphasis on project planning and strategic implementation than tactical model development.  At the same time, the tactical and strategic orientations may be taken independently, or as a series depending on experience, role and objectives:

tma1

When taking the full series, these courses will ensure that you build accurate, validated models that avoid common tactical shortcomings like test data memorization and other effects that the Elder / Rexer / Abbot article skillfully presented.  At the same time, there is no other courseware on the market that insists upon making predictive analytics actionable at the project level with a purely strategic orientation.

To really stand out with the soft skills to manage a modeling effort at the project level and arrive at measurable results that catch the attention of leadership, first get trained on the strategic practice of analytics, then start with a thorough assessment and project definition.  Your organization will then move beyond the common dysfunction in analytics toward tangible and sustainable performance, and you will have a powerful case summary for your professional profile.

About the Author

Eric A. King is the president and founder of The Modeling Agency, LLC – an advanced analytics training and consulting company providing strategic guidance and impactful results for the data-rich yet information-poor.  Eric is a co-presenter in a popular monthly live, interactive analytics webinar entitled “Data Mining: Failure to Launch”.  Eric may be reached at (281) 667-4200 x210 or eric@the-modeling-agency.com.