• This field is for validation purposes and should be left unchanged.
analytics program enablement for growth-mindset organizations

Small Data Before Big Data

Orange

Does Quantity Really Trump Quality?

 

During June’s Q&A session TMA’s Senior Consultant and Training Director Tony Rathburn said, “Many organizations need to worry about small data before they worry about big data.”   Tony adds that having greater quantity of data can either enhance precision or add distortion.

These are sentiments echoed by James Guszczca and Bryan Richardson in their excellent essay: Two Dogmas of Big Data: Understanding the Power of Analitics for Predicting Human Behavior.

What is small data, and what can organizations do with it?

Small data is readily available, less controversial, and much higher in value.

Guszczca and Richardson used the example of a university admissions office in order to drive their points home. The office wanted to admit students who truly had the best chance of succeeding at the university. The data set used to solve the problem was already right at hand: high school transcript data, which the students had already voluntarily provided as a part of the routine admissions process.

This data is not particularly glamorous or unstructured. There’s nothing new about using transcript information to make admissions decision. The difference was in the tools the office was now using to make better admissions decisions.

Takeaway question: what readily available, uncontroversial data sources might your organization have on hand right now? Chances are that data already hints at some aspect of human behavior, something your organization either wants more of or less of. And although the vast majority of most large organization’s data is text or unstructured data, the biggest bang for an initial effort is from your readily available structured data.

For example, you take down credit applications for an item which you sell. Those applications contain voluntary small data bits — namely, zipcodes, which can hint at average home values. Each application also contains information about how much money each prospect makes.

In the retail industry, perhaps some portion of your customer base regularly experiences “buyer’s remorse,” which leads to chargebacks and a loss of money. You want to minimize this behavior (chargebacks). The relatively small structured data located on those credit apps may be the primary source of information you need in order to make a substantial impact on reducing returns. It may tell you that you need to stop sending your reps to homes that are worth $300,000 or more. You and your reps have been assuming that a $300,000 home means the buyer has enough money to purchase the product. In truth, the data may reveal that the homeowners at that price point, who leverage certain levels of debt and a few other criteria that the model recognizes in combination, are commonly “in over their head” and unable to take on more credit obligations. The real sweet spot for your sales force may instead lay within homes in the $150,000 to $225,000 range, along with several other factors that the predictive model considers for an overall targeting score.

You don’t have to dig deeply into your existing data in order for a model to arrive at highly impactful results. You probably could have run the same highly effective predictions with the same data decades ago, as well.

Small data does not require Big Data’s “3Vs.”

“Big” data comes with problems of its own. By its very definition “it has such high volume, variety, and velocity so as to create problems for traditional data processing and analysis technologies.” (Guszczca and Richardson). Most organizations don’t truly need or want to get anywhere near Big Data, even if they don’t know it. Warehousing all that data becomes a huge project in its own right, one that is low value for most companies.

Small data holds high value with low effort.

As The Modeling Agency insists to its clients, even if the vast majority of the data that you collect and store is unstructured (textual) “Big Data”, start your analytics with available structured (numeric).

Structured data is like pressure-laden oil reserves at the surface, as opposed to Big Data which is deep and challenging to extract.  There is certainly value in Big Data, but like drilling deep and sideways, it takes far more effort for incremental gain.  Further, organizations will have far greater impact with a sound strategic approach to analytics that ensure actionable models that fit the environment and align well to organizational objectives, than attempting to transform textual data into an optimized model that fails for a host of strategic reasons.

The Modeling Agency guides organizations to conduct a comprehensive assessment of their resources, environment, situation and objectives to identify valid opportunities for actionable and measurable results that leadership will value.  Part of the assessment is to consider all available data resources – big or small; structured and unstructured – to ensure it supports organizational priorities and develop a sound project definition for a superior analytic process.  From there, TMA guides the organization’s analytic practitioners through a 6-Phase modeling process to assume full ownership of a highly effective and impactful analytic practice.

Don’t buy into the industry Big Data hype.  Those who actually part from the buzz and apply sound strategic implementation with training and experienced guidance will leapfrog their competition.

 

Does Quantity Really Trump Quality?

 

During June’s Q&A session TMA’s Senior Consultant and Training Director Tony Rathburn said, “Many organizations need to worry about small data before they worry about big data.”   Tony adds that having greater quantity of data can either enhance precision or add distortion.

These are sentiments echoed by James Guszczca and Bryan Richardson in their excellent essay: Two Dogmas of Big Data: Understanding the Power of Analitics for Predicting Human Behavior.

What is small data, and what can organizations do with it?

Small data is readily available, less controversial, and much higher in value.

Guszczca and Richardson used the example of a university admissions office in order to drive their points home. The office wanted to admit students who truly had the best chance of succeeding at the university. The data set used to solve the problem was already right at hand: high school transcript data, which the students had already voluntarily provided as a part of the routine admissions process.

This data is not particularly glamorous or unstructured. There’s nothing new about using transcript information to make admissions decision. The difference was in the tools the office was now using to make better admissions decisions.

Takeaway question: what readily available, uncontroversial data sources might your organization have on hand right now? Chances are that data already hints at some aspect of human behavior, something your organization either wants more of or less of. And although the vast majority of most large organization’s data is text or unstructured data, the biggest bang for an initial effort is from your readily available structured data.

For example, you take down credit applications for an item which you sell. Those applications contain voluntary small data bits — namely, zipcodes, which can hint at average home values. Each application also contains information about how much money each prospect makes.

In the retail industry, perhaps some portion of your customer base regularly experiences “buyer’s remorse,” which leads to chargebacks and a loss of money. You want to minimize this behavior (chargebacks). The relatively small structured data located on those credit apps may be the primary source of information you need in order to make a substantial impact on reducing returns. It may tell you that you need to stop sending your reps to homes that are worth $300,000 or more. You and your reps have been assuming that a $300,000 home means the buyer has enough money to purchase the product. In truth, the data may reveal that the homeowners at that price point, who leverage certain levels of debt and a few other criteria that the model recognizes in combination, are commonly “in over their head” and unable to take on more credit obligations. The real sweet spot for your sales force may instead lay within homes in the $150,000 to $225,000 range, along with several other factors that the predictive model considers for an overall targeting score.

You don’t have to dig deeply into your existing data in order for a model to arrive at highly impactful results. You probably could have run the same highly effective predictions with the same data decades ago, as well.

Small data does not require Big Data’s “3Vs.”

“Big” data comes with problems of its own. By its very definition “it has such high volume, variety, and velocity so as to create problems for traditional data processing and analysis technologies.” (Guszczca and Richardson). Most organizations don’t truly need or want to get anywhere near Big Data, even if they don’t know it. Warehousing all that data becomes a huge project in its own right, one that is low value for most companies.

Small data holds high value with low effort.

As The Modeling Agency insists to its clients, even if the vast majority of the data that you collect and store is unstructured (textual) “Big Data”, start your analytics with available structured (numeric).

Structured data is like pressure-laden oil reserves at the surface, as opposed to Big Data which is deep and challenging to extract.  There is certainly value in Big Data, but like drilling deep and sideways, it takes far more effort for incremental gain.  Further, organizations will have far greater impact with a sound strategic approach to analytics that ensure actionable models that fit the environment and align well to organizational objectives, than attempting to transform textual data into an optimized model that fails for a host of strategic reasons.

The Modeling Agency guides organizations to conduct a comprehensive assessment of their resources, environment, situation and objectives to identify valid opportunities for actionable and measurable results that leadership will value.  Part of the assessment is to consider all available data resources – big or small; structured and unstructured – to ensure it supports organizational priorities and develop a sound project definition for a superior analytic process.  From there, TMA guides the organization’s analytic practitioners through a 6-Phase modeling process to assume full ownership of a highly effective and impactful analytic practice.

Don’t buy into the industry Big Data hype.  Those who actually part from the buzz and apply sound strategic implementation with training and experienced guidance will leapfrog their competition.

 

Data Mining Webinar

Learn How to Get Predictive Modeling
Off the Ground and Into Orbit
1 Hour Live Interactive Event

Why Train With TMA?

Determine whether TMA training is right for you, and learn why TMA is truly the best option for live classroom analytics training.