The Modeling Agency logo

Analytics Transformed™
  • This field is for validation purposes and should be left unchanged.

Data Preparation: A Vital Step in Predictive Analytics

DataPreparationDo you know the difference between a great modeler and a mediocre modeler? Many people assume that modeling greatness is wrapped up in the ability to build a better algorithm.

Building a better algorithm is like building a faster rocket ship. It’s great, as long as you’re pointing the ship in the right direction. Nobody wants to move faster if they’re going in the wrong direction, but this is precisely what happens in many organizations. Data preparation is a big part of keeping that rocket pointed in the right direction.

There are two issues which you might encounter during your data preparation phase.

The issue of low-quality data.

The professionals at TMA are often asked whether predictive analytics is possible with “low quality” data. The short answer is — yes, of course it is. The data is what it is, and a modeler can rarely wait for higher quality data to present itself. Instead, the modeler will have to engage with the data, cleaning it up so that it may be used.

There is, of course, a caveat that you should be aware of. For example, you never want to clean your data to the point where you can’t develop it in a live environment. You should always do your modeling in the environment where the data is expected to perform.

The issue of low-quantity data.

TMA professionals also hear a lot of concerns about the quantity of data that any given organization might possess. There is always a fear that there just won’t be enough data to complete meaningful projects.

Yes, you do need sufficient data to complete your project.

No, this isn’t often a real problem for the modern organization. The typical organization will have far more data than is necessary to complete most projects.

The modern organization usually has more data than it can handle.

It’s all about getting to know your data.

If either of these issues are poised to become a problem you will learn about it during the data prep phase. This phase is all about getting to know the data and its limitations so that the data may be applied to the problem at hand. You can’t skip this step–you must understand what your data can do.

This is yet another reason why you can’t just dump data into a software program or an algorithm if you expect to get good results.

But it’s usually possible to solve the problems inherent in the data. You shouldn’t let the state of the data stop you. You should just accept it as part of the process.

All data is dirty. It’s up to you, as the analyst, to improve it.

Data Mining Webinar

Learn How to Get Predictive Modeling
Off the Ground and Into Orbit
1 Hour Live Interactive Event

Why Train With TMA?

Determine whether TMA training is right for you, and learn why TMA is truly the best option for live classroom analytics training.