Thanks to everyone who tuned into our webinar: Increasing Customer Acquisition and Cross-sell with Predictive Analytics.  A few of you had questions and we’d like to take the chance to answer them here:

1.  How long does it take to get a Predictive Analytics project started, and how long is the typical duration of a project from start to finish?

A typical project starts with a Business Understanding phase that includes about a week of meetings at our client’s location. During this kickoff phase, we:

  • Review our client’s pressing business needs;
  • Gain early understanding of their data and business processes; (and)
  • Identify a good candidate starting project.

A good starting project is one that solves an important problem, has data available for analysis, and has a relatively high expected return on investment so that our efforts provide real value to our clients as soon as possible.

Additionally, this project should include some customer communications or offers we can test. This test time may be in addition to the advanced analytics work. We help our clients assess campaign success and plan next steps to increase effectiveness.

A full project takes 6 to 12 weeks depending on the availability and complexity of the data, the scope of the advanced analytics effort, and the kinds of analytics performed.

2.  Is there a certain type or amount of data that we need to have in order to create a Predictive Model?

Most of the time, data analysts prefer more data rather than less. That means more columns and more records. It’s best to gather more data than you think you will need so the analysts have options in terms of how they design their analyses.

It’s also a best practice to include analysts in early discussions about data gathering, so you’ll get an insider’s perspective on how to stage the data before the analytics work begins.

The total number of data columns used depends on a number of factors, such as:

  • The business question being analyzed
  • The time frame over which the data is valid
  • The number of competing theories being assessed in the analytics effort
  • The number of data sources available
  • How much additional information can be derived from the data available

For example, if the data consists of transactions, we can start by looking at the volume, value, and timing of transactions, various relationships within the transactions, and how it breaks down by product group. With this, we can create both a modeling dataset and a validation dataset, which allows us to effectively run and test the predictive models.

To create a model, the minimum number of cases is based on common statistical considerations. The equation basically looks like this: (5 x number of input variables) x (1 modeling dataset + 1 validation dataset). For example, if we want to assess a theory about offer acceptance and have 10 variables, then the minimum dataset would have (5×10) x (1+1) = (5×10) x (2) = 100 cases. This gives us 50 cases for a training dataset and 50 cases for a validation dataset.

In most of the work we do, our clients have thousands or even millions of customers, so we use quite a bit more data. For very large organizations, we pull a 10% random sample and use that as the basis for the modeling dataset. 10%of a database is usually sufficient, representative, and believable by all stakeholders.

If you have questions or would like to discuss your current needs and how we can help you maximize your efforts using predictive analytics, please call us at 1-877.676.3743 or send email to