• This field is for validation purposes and should be left unchanged.
Guidance and results for the data-rich, yet information-poor.

PREDICTIVE ANALYTICS & DATA MINING:
MODEL DEVELOPMENT

A Tactical Drill-Down Of Process,
Methods, Tools And Techniques


ABOUT THIS COURSE

The Modeling Agency’s “Model Development” course presents a deep dive into the data mining process at a tactical level. Attendees will observe demonstrations of machine learning methods and computer-guided analytical techniques for extracting and interpreting complex patterns and relationships from large volumes of data. If you desire an intensive tactical orientation to data mining concepts, tools, techniques and supporting methods, then this event is designed for you.

This vendor-neutral course broadly covers data-driven information discovery techniques and model-building tactics without restriction to any particular modeling tool. Popular open-source and commercial packages are leveraged to illustrate methods, but not to showcase the tools. There are no prerequisites for this course. However, participants will benefit by reviewing the CRISP-DM guide ahead of the training.

Each course in the series is designed to be taken independently or as a natural progression from tactics to strategy and practice. View the course series overview page to compare the two primary orientations and target the most fitting agenda for your experience, situation and objectives.


WHO SHOULD ATTEND

IT PROFESSIONALS: who wish to expand their skills in this increasingly visible area within the corporate IT agenda

PROJECT LEADERS: who must report on developmental progress, resource requirements and system performance

DECISION SUPPORT SYSTEM ARCHITECTS: who require an understanding of the infrastructures required for supporting a data mining solution

BUSINESS ANALYSTS: who must develop and interpret the models, communicate the results and make actionable recommendations

FUNCTIONAL ANALYSTS: Customer Relationship Managers, Risk Analysts, Business Forecasters, Statistical Analysts, Inventory Flow Analysts, Direct Marketing Analysts, Medical Diagnostic Analysts, Market Timers, e-commerce System Architects and Web Data Analysts


BENEFITS OF ATTENDING

  • Vendor-neutral exposure to tools and techniques that will place you months ahead in method planning and product surveying
  • Examine which methods and tools are most effective for your needs
  • Avoid pitfalls in data preparation, modeling, and results interpretation
  • Leave with resources, contacts and actionable plans to substantially increase your analysis capabilities while minimizing dead ends

THE BUSINESS CHALLENGE

The rapid emergence of electronic data processing and collection methods has led some to call recent times as the “Information Age.” However, it may be more accurately termed as “The Age of the Data Glut.” Most businesses either posses a large database or have access to one. These databases contain so much data that it becomes very difficult to understand just what that data is telling us.
There is hardly a transaction that does not generate a computer record somewhere. All this data has meaning with respect to making better prospective business decisions and anticipating customer needs and preferences. But how do you discover those needs and preferences in a database that contains gigabits of seemingly incomprehensible numbers and facts? Data mining and predictive analytics does just that.

The intent of this course is to offer attendees a stronger grasp of data mining techniques, a solid understanding of how various methods and tools apply to different kinds of data intensive problems, and how to overcome limitations that cause predictive models to underperform.


WHAT YOU WILL LEARN

  • The data mining process and general implementation
  • How to prepare raw data and benefit from visualization
  • Various data mining methods and how they compare
  • Advanced model building techniques
  • Results analysis and validation
  • Technology and product selection
  • Solution integration, ongoing performance and maintenance
  • Where to begin and how to obtain resources and support

WHAT MAKES THIS COURSE UNIQUE

This course does not restrict or skew the presentation of data mining methods through a single product. Rather, the course gives consideration to all resources from a vendor-neutral position. The instructor possesses a wealth of pragmatic experience in applying data mining technology across industries in real-world applications. This course insists upon making predictive analytics constructive and interpretable in a business or organizational setting.

In addition, live modeling demonstrations projected from the presenter’s machine will support the instructional sessions. The demonstrations will highlight superior performance as well as pitfalls. The instructor will show how to evaluate various packages based on strengths, limitations, value and general performance.


COURSE OUTLINE

INTRODUCTION

  • What you will get in this course

  • What is PA/DM?

    • Definition

    • Related terms and fields

      • Machine learning

      • Computer-aided pattern discovery

      • Business analytics and statistics

      • Others you have heard?

    • Examples

    • Differences

  • How can you develop PA/DM opportunities

    • Generative questions

    • Examples

  • Nuts and bolts of a project

    • Big Picture: Introduction to CRISP-DM

      • What is it?  What is it not?

      • Why do we care? Why use  it? What is it good for?

    • Example: Tour of CRISP-DM in real-world context

    • Team Exercise

  • One Practitioner’s View

    • Regarding PA/DM: What’s hype and what isn’t?

    • How to be successful with PA/DM

    • Tools and products

    • People matter


CRISP-DM METHODOLOGY: Parts 3, 4, 5

  • Highlight CRISP-DM 1, 2, 6
    CRISP 1, 2, 6 are detailed in the Strategic Implementation course

    • Business understanding

    • Data understanding

    • Deployment

  • Data Preparation (CRISP 3)

    • Rows: Select data

      • How much data?

      • Rows: Selecting the “unit  of analysis”

      • Determine what the record will  look like

      • Determine how many records we have to work with

      • Site  selection example

    • Rows: Defining the population / outcome of interest

    • Modeling Goals

      • Simple: Response vs. Non-Response

      • Uplift / Incremental Lift / Net Lift Modeling: Identifying
        those most receptive to a treatment or offer

    • Rows: Sampling methods / oversampling

    • Rows: Exclusions / rules of thumb

    • Columns: Identifying types

      • Need definitions (from clients or internal) so that we
        understand what the data represents.  Don’t assume
        that an element isn’t important

      • Categorical / Nominal (what does null mean)?

      • Ordinal

      • Interval / Rational

      • Date / Time

      • Sub-Types (money, count, geo, id, etc, and why care?)

    • Columns: Appropriate statistics and visualizations

      • Univariate

      • Multivariate

    • Columns: Selection for modeling

      • See “Clean Data” for pre-modeling elimination of
        redundant, constant, etc columns

      • Final selection is done during the Modeling phase

    • Build and Execute Transformations

      • Sources: Household File; Demographics; Derived Variables

      • Transformations: Counts, Category, Binary, Logarithmic, etc.

    • Document the above in a “Scorecard”

  • Modeling  (CRISP 4)

    • Select modeling technique

      • Taxonomies: An overview

        • Supervised vs. Unsupervised

        • Descriptive vs. Predictive

        • Classification vs. Estimation

      • Supervised — Constellation of methods with pros and cons

      • Classification

        • Decision Trees

        • Logistic Regression

        • Neural Networks

        • K-Nearest Neighbor

      • Prediction

        • Linear Regression

        • Neural Networks

        • MARS

      • Exercise: Scenario revisited — What method(s) do we choose?

    • Unsupervised — More methods with pros and cons

      • Segmentation / Clustering

        • Hierarchical clustering

        • K-Means

        • Decision trees

      • Association Rules

    • Team Exercise: Com up with an expert-derived decision tree to
      make a selection for supervised problems

    • Advanced Topics

      • Ensemble / Hybrid Models

      • Bagging

      • Boosting

    • Parting remarks

      • Models should be as simple as possible, but no simpler

      • Why not both?  (a low-res descriptive model and
        a high-res opaque accuracy model)

    • Generate test design

      • Data segregation

      • Performance metrics: Whenever possible, go for the
        custom metric — “If you build it, they will come.”

    • Build Model

      • Use a tool, select a method, set parameters (if any),
        select candidate columns, select outcome (if supervised)

      • Variable selection techniques for supervised methods

      • Variable selection techniques for unsupervised methods

    • Assess Model (Tweaking)

      • Predictors

      • Manually removing or limiting

      • Forcing predictors

    • Structure

    • Profiles

    • Compared to What?

      • Baseline model comparison

      • Train/Test/Validation comparison

    • Scoring the model

      • What does scoring mean?

      • How is it different from building the model?

      • What are we looking for when scoring?

    • Final Product

      • Model(s)

      • Description(s)

      • Text Mining / Text Analysis

  • Evaluation  (CRISP 5)

    • Evaluate results (from business perspective)

    • Prelude to business use presentation

      • Informal, low-risk setting

      • Poke holes early, before business presentation

    • Does the model or segmentation make sense?

    • Does it contradict of reinforce the standard “lore”?

    • Get support and buy-in from potential champions

    • Candidate names for  segments

    • Present results to business users or clients

      • BUs need to be convinced: Models, segments and analysis
        need to be marketed!

      • Deployment will require change

        • To processes

        • To systems

        • To ingrained mindsets

      • Deployment costs (to each change area above)

      • Results must have business value, not technical representations

      • Performance results — in business terms

      • Descriptions

        • No equations

        • Tell the story, paint the vision, what will life be like with
          or without the model in place?

    • Review Process

      • Follow-ups to the presentation

        • Anticipate follow-up issues in planning and estimates

        • Revisions to the model(s) or segments based upon feedback

        • Final quality assurance

    • Determine next steps

      • Are you done?

      • Will the model(s) be deployed?  Why or why not?

      • Document!

      • Lessons learned meeting

    • Final Product

    • Consulting Exercise


WRAP-UP AND PARTING THOUGHTS

  • Final Q&A

    • Springboard exercise

  • PA/DM Philosophy

    • Understand the problem

    • Understand the data

    • Work on problems with specific business goals,
      specific hypotheses to be tested.  Do NOT go
      prospecting for “data mining nuggets.”

  • Next Steps

    • Proceed to the Strategic Implementation course

    • Certification Exam (for those who complete the series)

    • Product training courses

    • Keep learning!

    • Supplementary materials and resources

    • Conferences and communities

    • Get started on a project!