Excel at Data Mining – Your First Predictions
Today, I’m going to show how to start making your first predictions in less than 5 minutes with the Microsoft Excel data mining add-in. The assumption of this post is that you have already installed the data mining add-in for Excel and configured the add-in to be pointed at an instance of SQL Server to which you have access rights.
For the example, we will be using a tutorial spreadsheet that can be found on Codeplex here.
Our goal for this post is to create a set of predictions for new customers using a Logistic Regression models based upon old customers.
Read below if you are unable to view the video walkthrough.
Step 1: Creating the Logistic Regression Model
We go to the DATA MINING tab, the Data Modeling group, and select the Advanced drop down. From here select Create Mining Structure to open the Mining Structure Wizard. Click Next to get past the title screen on the wizard. From the drop down, select ‘Table Analysis Tools Sample’!’Table2’. Click Next three times and then click Finish.
Now, go back to the Advanced drop down and select Add Model to Structure. Click Next to get past the wizard title screen. Select the Table Structure that you just created and click Next. From the drop down, select Logistic Regression Model and click Next. For Purchased Bike, changed the drop down to Input and Predict. Click Next and Finish. You should see a Model Viewer that looks like this:
Step 2: Creating Prediction
This section will show us how to use the Excel DM Query Wizard to create a DMX query that will allow us to use the model that we just created to predict whether new customers will buy a bicycle or not.
Let’s go to the DATA MINING tab, the Model Usage group, and select Query. Click Next. Select the Logistic Regression Model that you just created and click Next. From the table dropdown, select ‘New Customers’!’Table 17’ and click Next.
Notice on the Specify Relation screen, the Wizard matched columns between the Mining Structure and New Customers by name. However, we are missing two columns.
The first is a match for the Income column in the mining structure. From the drop down, select Yearly Income. The second is the Purchased Bike column, which we will leave blank because we are trying to predict this data for New Customers. Click Next.
Click on Add Output.
Select Purchased Bike, PredictProbability, and then Yes. Click OK.
Now we can see part of the MDX query expression that Excel created.
Let’s change the name to Probability of Bike Purchase. Click Next.
We will leave the selection on Append to the input data. Click Finish.
Now if you go to the New Customers tab, you will an appended column on the end with the probability that each new customer will buy a bike.
And that is how you use a mining model to create predictions in less than five minutes.
Billy Decker is a consultant at StatSlice Systems. He graduated with a dual degree in Physics and Mathematics from the University of Texas at Austin and received his Masters Degree in Physics from the University of Texas at Arlington. He previously worked for Global Technical Services as a Senior Training Analyst and Bell Hellicopter as an Instructional Designer. His technical experience includes, but is not limited to, SQL, SAP, Business Objects, QlikView, and Sharepoint.
You can subscribe to our RSS feed.