Excel at Data Mining – Time Series Forecasting
Today, I’m going to show you how to start data mining in less than 5 minutes with the Microsoft Excel data mining add-in. The assumption of this post is that you have already installed the data mining add-in for Excel and configured the add-in to be pointed at an instance of SQL Server to which you have access rights.
Read below if you are unable to view the video walkthrough.
For the example, we will be using a tutorial spreadsheet that can be found on Codeplex here.
Our goal for this post is to create a Forecasting model that will predict the trend of Bikes Sales in different regions.
Step 1: Using the Forecasting Wizard
We go to the DATA MINING tab, the Data Modeling group, and select the Forecasting drop down.
Now the Forecasting Wizard will appear, press next to go past the introduction screen. In the Select Data Source Screen that appears, we select the Data range radio button and click the range selector button. On the tab named ‘Forecasting’, we select all the data including the headers.
Back at the wizard, we make sure that My data range has headers is selected and hit Next.
The next part of the wizard allows us to choose our input columns for analysis. Here we will choose to analyze the revenue for Europe, North America and the Pacific. Check to make sure that the Time stamp drop down is set to Year_Month. Also note the Parameters button in the lower left hand corner. A detailed discussion of the settings available are beyond the scope of this blog post, but many different Forecasting options can be set here. Hit Next.
On the final screen of the wizard, we will set the Structure Name to ‘Revenue Forecasting Model’ and then click Finish.
Step 2: Exploring Our Forecasting Model
After the model finishes processing, the model browser appears on the screen. From here we can explore two different aspects of the Forecasting model that we created: a visual chart that displays the forecasts and the equations built by the model to make forecasts.
First note that the axes on our chart: the x-axis is the time scale set forth in our Month_Years column and the y-axis is not an absolute number but is a % change compared to initial values in our model.
The first thing we play with is the number of prediction steps. Let’s increase the steps from 5 to 10. We can see here some revenue trends for our three different regions in the forecasted region that is denoted by a slightly different color background and lines becoming dotted.
Let’s click on the chart at ‘200303’, we have no created a vertical line that will tell us the values of all three graphs on the right hand side under Mining Legend.
If you click in the forecasted region, the Mining Legend will display the values as ‘predicted’ instead of ‘actual’.
Now let’s turn on turn on the model deviations by selecting ‘Show Deviations’. We can see here that the forecasts for North America and Europe have a lot more uncertainty than the forecast for the Pacific.
Also, if you ever want to see the absolute numbers on the y-axis instead of the % change for a forecast, all you need to do is turn off the other forecasts by unselecting them in the middle of the mining browser.
We can also look at the equations that generate the different forecasts presented in Decision Tree form. An in depth discussion is beyond the scope of this post, but the basic premise is that multiple equations can be used for modeling if the behavior changes at a certain point.
For instance, if we select the North America Amount tree, we see it splits into two equations and the equations switched over at the point where the Europe Amount (-1) reaches the value 78832.766. The term Europe Amount(-N) means the value N time slices previous to the current North American amount.
So there’s how you start time series forecasting in less than 5 minutes. Look for my next data mining tutorial soon.
Billy Decker
Billy Decker is a consultant at StatSlice Systems. He graduated with a dual degree in Physics and Mathematics from the University of Texas at Austin and received his Masters Degree in Physics from the University of Texas at Arlington. He previously worked for Global Technical Services as a Senior Training Analyst and Bell Hellicopter as an Instructional Designer. His technical experience includes, but is not limited to, SQL, SAP, Business Objects, QlikView, and Sharepoint.