Guidance and results for the data-rich, yet information-poor.

Paul Kautza, Director of Eduction for TDWI, took some time to talk to TMA’s Training Director Tony Rathburn at the 10th Anniversary European TDWI Conference. You can watch the interview in the video below.

Tony’s comments on his European classes could easily apply to TMA’s training classes as well: “One of the major hurdles that organizations are facing is that we have a business perspective on the problem, we have an IT perspective on the problem, and we have a quantitative perspective on analytics projects. People are very skilled at what they’re doing, but they don’t talk to each other very well.”

All of these stakeholders must learn how to talk to each other if data mining and predictive analytics projects are to succeed.

Orange

Does Quantity Really Trump Quality?

 

During June’s Q&A session TMA’s Senior Consultant and Training Director Tony Rathburn said, “Many organizations need to worry about small data before they worry about big data.”   Tony adds that having greater quantity of data can either enhance precision or add distortion.

These are sentiments echoed by James Guszczca and Bryan Richardson in their excellent essay: Two Dogmas of Big Data: Understanding the Power of Analitics for Predicting Human Behavior.

What is small data, and what can organizations do with it?

Small data is readily available, less controversial, and much higher in value.

Guszczca and Richardson used the example of a university admissions office in order to drive their points home. The office wanted to admit students who truly had the best chance of succeeding at the university. The data set used to solve the problem was already right at hand: high school transcript data, which the students had already voluntarily provided as a part of the routine admissions process.

This data is not particularly glamorous or unstructured. There’s nothing new about using transcript information to make admissions decision. The difference was in the tools the office was now using to make better admissions decisions.

Takeaway question: what readily available, uncontroversial data sources might your organization have on hand right now? Chances are that data already hints at some aspect of human behavior, something your organization either wants more of or less of. And although the vast majority of most large organization’s data is text or unstructured data, the biggest bang for an initial effort is from your readily available structured data.

For example, you take down credit applications for an item which you sell. Those applications contain voluntary small data bits — namely, zipcodes, which can hint at average home values. Each application also contains information about how much money each prospect makes.

In the retail industry, perhaps some portion of your customer base regularly experiences “buyer’s remorse,” which leads to chargebacks and a loss of money. You want to minimize this behavior (chargebacks). The relatively small structured data located on those credit apps may be the primary source of information you need in order to make a substantial impact on reducing returns. It may tell you that you need to stop sending your reps to homes that are worth $300,000 or more. You and your reps have been assuming that a $300,000 home means the buyer has enough money to purchase the product. In truth, the data may reveal that the homeowners at that price point, who leverage certain levels of debt and a few other criteria that the model recognizes in combination, are commonly “in over their head” and unable to take on more credit obligations. The real sweet spot for your sales force may instead lay within homes in the $150,000 to $225,000 range, along with several other factors that the predictive model considers for an overall targeting score.

You don’t have to dig deeply into your existing data in order for a model to arrive at highly impactful results. You probably could have run the same highly effective predictions with the same data decades ago, as well.

Small data does not require Big Data’s “3Vs.”

“Big” data comes with problems of its own. By its very definition “it has such high volume, variety, and velocity so as to create problems for traditional data processing and analysis technologies.” (Guszczca and Richardson). Most organizations don’t truly need or want to get anywhere near Big Data, even if they don’t know it. Warehousing all that data becomes a huge project in its own right, one that is low value for most companies.

Small data holds high value with low effort.

As The Modeling Agency insists to its clients, even if the vast majority of the data that you collect and store is unstructured (textual) “Big Data”, start your analytics with available structured (numeric).

Structured data is like pressure-laden oil reserves at the surface, as opposed to Big Data which is deep and challenging to extract.  There is certainly value in Big Data, but like drilling deep and sideways, it takes far more effort for incremental gain.  Further, organizations will have far greater impact with a sound strategic approach to analytics that ensure actionable models that fit the environment and align well to organizational objectives, than attempting to transform textual data into an optimized model that fails for a host of strategic reasons.

The Modeling Agency guides organizations to conduct a comprehensive assessment of their resources, environment, situation and objectives to identify valid opportunities for actionable and measurable results that leadership will value.  Part of the assessment is to consider all available data resources – big or small; structured and unstructured – to ensure it supports organizational priorities and develop a sound project definition for a superior analytic process.  From there, TMA guides the organization’s analytic practitioners through a 6-Phase modeling process to assume full ownership of a highly effective and impactful analytic practice.

Don’t buy into the industry Big Data hype.  Those who actually part from the buzz and apply sound strategic implementation with training and experienced guidance will leapfrog their competition.

 

Does Quantity Really Trump Quality?

 

During June’s Q&A session TMA’s Senior Consultant and Training Director Tony Rathburn said, “Many organizations need to worry about small data before they worry about big data.”   Tony adds that having greater quantity of data can either enhance precision or add distortion.

These are sentiments echoed by James Guszczca and Bryan Richardson in their excellent essay: Two Dogmas of Big Data: Understanding the Power of Analitics for Predicting Human Behavior.

What is small data, and what can organizations do with it?

Small data is readily available, less controversial, and much higher in value.

Guszczca and Richardson used the example of a university admissions office in order to drive their points home. The office wanted to admit students who truly had the best chance of succeeding at the university. The data set used to solve the problem was already right at hand: high school transcript data, which the students had already voluntarily provided as a part of the routine admissions process.

This data is not particularly glamorous or unstructured. There’s nothing new about using transcript information to make admissions decision. The difference was in the tools the office was now using to make better admissions decisions.

Takeaway question: what readily available, uncontroversial data sources might your organization have on hand right now? Chances are that data already hints at some aspect of human behavior, something your organization either wants more of or less of. And although the vast majority of most large organization’s data is text or unstructured data, the biggest bang for an initial effort is from your readily available structured data.

For example, you take down credit applications for an item which you sell. Those applications contain voluntary small data bits — namely, zipcodes, which can hint at average home values. Each application also contains information about how much money each prospect makes.

In the retail industry, perhaps some portion of your customer base regularly experiences “buyer’s remorse,” which leads to chargebacks and a loss of money. You want to minimize this behavior (chargebacks). The relatively small structured data located on those credit apps may be the primary source of information you need in order to make a substantial impact on reducing returns. It may tell you that you need to stop sending your reps to homes that are worth $300,000 or more. You and your reps have been assuming that a $300,000 home means the buyer has enough money to purchase the product. In truth, the data may reveal that the homeowners at that price point, who leverage certain levels of debt and a few other criteria that the model recognizes in combination, are commonly “in over their head” and unable to take on more credit obligations. The real sweet spot for your sales force may instead lay within homes in the $150,000 to $225,000 range, along with several other factors that the predictive model considers for an overall targeting score.

You don’t have to dig deeply into your existing data in order for a model to arrive at highly impactful results. You probably could have run the same highly effective predictions with the same data decades ago, as well.

Small data does not require Big Data’s “3Vs.”

“Big” data comes with problems of its own. By its very definition “it has such high volume, variety, and velocity so as to create problems for traditional data processing and analysis technologies.” (Guszczca and Richardson). Most organizations don’t truly need or want to get anywhere near Big Data, even if they don’t know it. Warehousing all that data becomes a huge project in its own right, one that is low value for most companies.

Small data holds high value with low effort.

As The Modeling Agency insists to its clients, even if the vast majority of the data that you collect and store is unstructured (textual) “Big Data”, start your analytics with available structured (numeric).

Structured data is like pressure-laden oil reserves at the surface, as opposed to Big Data which is deep and challenging to extract.  There is certainly value in Big Data, but like drilling deep and sideways, it takes far more effort for incremental gain.  Further, organizations will have far greater impact with a sound strategic approach to analytics that ensure actionable models that fit the environment and align well to organizational objectives, than attempting to transform textual data into an optimized model that fails for a host of strategic reasons.

The Modeling Agency guides organizations to conduct a comprehensive assessment of their resources, environment, situation and objectives to identify valid opportunities for actionable and measurable results that leadership will value.  Part of the assessment is to consider all available data resources – big or small; structured and unstructured – to ensure it supports organizational priorities and develop a sound project definition for a superior analytic process.  From there, TMA guides the organization’s analytic practitioners through a 6-Phase modeling process to assume full ownership of a highly effective and impactful analytic practice.

Don’t buy into the industry Big Data hype.  Those who actually part from the buzz and apply sound strategic implementation with training and experienced guidance will leapfrog their competition.

 

The Data Warehousing Institute invited TMA’s Tony Rathburn to deliver the keynote speech entitled “Enhanced Resource Allocation: Business Use  of Predictive Analytics and Data Mining” at their World Conference in Boston.  Watch the full keynote presentation as Tony emphasizes how most analytic practioners have tunnel vision on the wrong end of the problem.  Tony reveals how data scientists are building more-than-accurate models, but falling short at the project level to arrive at results that are truly actionable, understandable, and measurable.

Watch for Tony’s story of a data mining project gone wrong about halfway through the video. It’s an excellent demonstration of how anyone can develop elegant solutions for all the wrong problems.

AlgorithmPeople often ask about various algorithms during TMA webinars. Often, the question revolves around whether or not TMA covers this or that algorithm in any training course.

But an algorithm is just a tool in a larger tool box, so this question is a little bit like asking a carpenter whether or not he ever uses his hammer, or his screwdriver.

It’s concerning to watch so many people get fixated on this or that analytical method. No algorithm or analysis method can magically solve business problems.

Instead, it’s up to the analyst to solve business problems by learning what the algorithm can do, just as it’s up to the carpenter to learn how to build a house by knowing when it’s time to use the hammer and when it’s time to use the screwdriver.

Otherwise, you’re just throwing data into a system. The system’s going to spit out a result, but it won’t necessarily spit out a useful result.

If you don’t learn how to master your tools, then your tools will master you. When that happens, you will be unable to realize the full potential of data mining and predictive analytics.

Want to ask questions of your own, and get answers from gurus Tony and Scott? Sign up for TMA’s next free webinar and learn how to launch your predictive analytics projects successfully!

It's going to take more than a mess of information to solve real business problems.

It’s going to take more than a mess of information to solve real business problems.

Everybody’s excited about “big data.” Organizations all over the world are leaping on board the Big Data Bandwagon, and are eager to see what it can do.

There’s just one problem. “Big data” isn’t exactly what you think it is. There are a lot of problems with the term, and those problems lead to misconceptions that keep you from making the most of your organization’s data.

“Big Data” is a marketing term.

There’s nothing new about predictive analytics. The science has been around for decades.

Big data has just been the particularly successful marketing term that has catapulted predictive analytics into the consciousness of the mainstream. But sooner or later, “big data” will start to lose steam. What will happen then?

Marketers will still need to sell predictive analytics software. So they’ll make up another term. The new term might be just as successful or it might fall flat, but it will still be a new name for the same science.

You need to know this, because it will keep you from making silly decisions. You will not, for example, leap to buy the most expensive “big data” software package on the market when you know that people have been performing many of the same functions in boring old Microsoft Excel for years and years.

“Big Data” assumes that bigger is better.

In truth, bigger isn’t always better, and “big data” can often mean “having more data than you know what to do with.” Many organizations need to master small data before they start messing around with big data…and many of your insights are going to come from sample sizes that represent only a small portion of the data that your organization has been collecting.

Stop worrying so much about big. It’s more important to shift your mindset about data. Gathering more and more data can’t help you until you’ve started asking the right questions. Namely: what problem are we attempting to solve by delving into this data? Everything else has to flow from that mindset.

Don’t ignore your data. But don’t romanticize it, either.

Yes, your data does have a lot to tell you. TMA wouldn’t be here if it didn’t.

You just can’t afford to get swept up into the hype. Data is just a tool, a tool that you will hopefully use to discover solutions to some of the problems that your organization is facing.

Data, and what you can do with it, is not magic. It is math. And when you recognize that, you can approach it as a mathematician would, which means ignoring all the hype. Instead, get laser focused on creating a data analytics program that will be truly useful to you.