The Modeling Agency logo

Analytics Transformed™
  • This field is for validation purposes and should be left unchanged.

Data Too Messy? Don’t Panic.

Messy DataWhen you’re working with predictive modeling you’re typically going to run into two different types of data sets.

Structured data sets are the easiest to deal with. Your financial data may be mostly structured, for example. You have a fixed dollar amount which came in during a specific period of time. The numbers always mean the same thing.

“Unstructured” data contains important information, but it might not live in any numeric form. You may have thousands of customer service calls or records to sort through. You may be tracking social media content, like the subject matter of your customer’s tweets. When that data is used, it will have to be prepared and converted into a specific number before it will be useful to you.

Most modern data environments are a very messy mix of structured and unstructured data, and usually that data is stored in huge quantities. It’s easy to get overwhelmed when you look at the sheer scope of the data that is available to you.

Fortunately, the problem is not insurmountable if you approach your data in the right way.

First and foremost, you must have a clear idea of what you are trying to accomplish with this data. What are your objectives? Once you know your objectives it’s easier to understand which data won’t make sense for this particular problem set.

Next, you’ll need to take a sample. You’re never going to analyze all of the data that’s available to you. You only need to put together a good enough representation of the whole solution space.

In fact, one of the biggest mistakes most modelers make is that they’re prone to going into the data assuming that “more” is better. In truth, adding more and more data is actually a good way to ensure the failure of your model.

Fortunately, knowing your targets makes it far easier to decide what you’re going to include. You’ll dive in, pick the data that is relevant to your objective, and then work with that data (and only that data).

Don’t jump into the data without setting your course. That’s like starting a book on Chapter 5. Both actions all but ensure you won’t have a clue what’s going on–which means you’re likely to create an even bigger mess than the one you started with.

Data Mining Webinar

Learn How to Get Predictive Modeling
Off the Ground and Into Orbit
1 Hour Live Interactive Event

Why Train With TMA?

Determine whether TMA training is right for you, and learn why TMA is truly the best option for live classroom analytics training.