Excel at Data Mining – Your First Model
Today, I’m going to show how to start data mining in less than 5 minutes with the Microsoft Excel data mining add-in. The assumption of this post is that you have already installed the data mining add-in for Excel and configured the add-in to be pointed at an instance of SQL Server to which you have access rights.
For the example, we will be using a tutorial spreadsheet that can be found on Codeplex here.
Our goal for this post is to create a Naïve Bayes Model that will predict whether or not a particular customer will purchase a bike.
Step 1: Create the Mining Structure
We go to the DATA MINING tab, the Data Modeling group, and select the Advanced drop down.
From here, we will select the Create Mining Structure option. In the Create Mining Structure Wizard that appears, we select the Data range radio button and click the range selector button. On the tab named ‘Table Analysis Tools Sample’, we select all the data including the headers. Back at the wizard, we make sure that My Data Range Has Headers is selected and hit Next.
Now, we are selecting which columns that we want to include in our mining structure, select Next.
Next, we will select how much of the data we selected earlier will be used for testing and how much will be used for training, select Next.
And finally, we name the structure ‘Bike_MS’, select Finish.
Step 2: Create the Mining Model
Going to the same Advanced that we used in Step 1, select Add Model to Structure.
Select ‘Bike_MS’ and then Next.
From the dropdown, select Microsoft Naïve Bayes and then Next.
Here, we are selecting how we want to use the columns in our mining structure. Notice that ‘Age’ and ‘Income’ are both automatically set to Do Not Use. The reason for this is that the Naïve Bayes Model only deals with discrete values. If we wanted to use either of those attributes, we would need to discretize them by using age or income ranges. The only change that we making here is to switch ‘Purchased Bike’ to Input and Predict then select Next.
Finally, we name our new model ‘Bike_MS – Bayes’, select Finish.
And that’s it – you have officially created a data mining that can be used to predict whether a particular customer will buy a bike or not. For more practice and exploration, I highly recommend using the spreadsheet from Codeplex as it has several different simple sets of data that are ready made to set up different models and mining structures.
Billy Decker is a consultant at StatSlice Systems. He graduated with a dual degree in Physics and Mathematics from the University of Texas at Austin and received his Masters Degree in Physics from the University of Texas at Arlington. He previously worked for Global Technical Services as a Senior Training Analyst and Bell Hellicopter as an Instructional Designer. His technical experience includes, but is not limited to, SQL, SAP, Business Objects, QlikView, and Sharepoint.
You can subscribe to our RSS feed.