In our recent post Discover the ROI of predictive analytics with 3 projects, we highlighted some powerful quick wins you can achieve using IBM SPSS Modeler to solve key business problems such as reducing churn and increasing cross-sell and up-sell. Want a closer look at how Modeler does its magic? Let’s pop the hood and take a quick tour. We’ll use a project for predicting customer churn, which we’ve found is one of the best ways to learn Modeler. Below you’ll see an example of an analytical stream that predicts churn. We’re using historical customer data from two databases where each row represents one customer. The building blocks of the stream are called “nodes,” and at each node, Modeler performs operations on the data. The data contains multiple variables, including that variable we want to predict: “Churn” which has two values: “Churn” and “No churn.” Here’s a breakdown of the nodes in the stream for predicting customer churn (from left to right):
- CRM database and Transactional data nodes: Source nodes reflect where we’re accessing data; Modeler can use a variety of databases such as SQL, SAS, Excel, XML, .csv, etc.
- Merge node: Joins the two data sets together for modeling, with one record for each customer.
- Type node: Labels variables as continuous (numeric) or categorical, and identifies the target variable. In this case the target variable is “Churn”, with two values: “Churn” or “No Churn”.
- Partition node: Partitions the data into two sets. A training set we use to build the model and a testing set we’ll use to evaluate the accuracy of our model.
- CHAID Churn node (pentagon): CHAID (CHi-squared Automatic Interaction Detection) is the decision tree classification algorithm used to build a model to predict our Churn variable. Modeler offers a wide range of modeling algorithms, many of them automated.
- CHAID Churn model node (gold nugget): This is the decision tree model created by the CHAID algorithm to predict Churn. It assigns each record in the data a prediction of “Churn” or “No Churn” and a confidence score for each prediction.
- Evaluation node: This node evaluates the accuracy of our model on the training and testing data sets using a cumulative gains curve or lift curve.
- Database node: We can use our model to score new data and write it into a database.
Once you’ve got predictive data in your database, you can translate it into actionable insights for other teams. In the case of preventing churn, your Marketing team could identify which customers are most likely to leave and engage them with targeted incentives to stay.