How to Use Power Query: Introducing Microsoft’s Newest BI Tool
If you’ve spent much time in Excel, odds are that at one point you had data in one table that you needed to combine with data from another table. There are few ways to attempt this manually which are time consuming and might not even work. One solution might be to attempt to line up the columns, first sorting both dataset, then copying the new dataset next to the original dataset. But if the number of rows do not match exactly, this will not work. Another alternative would be to use the vLookup function. This function is great for looking up values, but doesn’t work if you need to return multiple rows from the new dataset. It’s also quite possible you didn’t have the data in the first place e.g. due to a table located on a webpage that wouldn’t quite copy correctly or a report that wouldn’t export to Excel for some reason.
Power Query solves all those issues by making it easy for anyone to do basic business intelligence work right in Excel. You can connect to your source database and query it directly, then merge your tables based on column criteria. You can merge data from different databases, different database types, and even data stored in unstructured formats like other Excel worksheets, webpages (like Facebook or Yelp), or flat files.
One of the most interesting features of Power Query is that it can help you discover data by using the Online Search tool to find datasets on the internet. If you want to search for the players in the 2013 NFL Draft, you just need to type 2013 NFL Draft in the search box. Taking this function with a grain of salt, there are some limitations – mostly on the tools ability to find relevant datasets. It tends to show mostly Wikipedia data, and the dataset needs to exist in a table on a webpage. So, for this specific example, when I tried to find the active players in the NFL, this was not available to me even though each player has a page on the NFL website.
Power Query is also free if you already have Excel which makes it one of the most cost effective tools available to perform these tasks. You can download it from Microsoft here. You will need to have Excel installed for it to work.
I’ll be using the AdventureWorks SQL database that is available from Microsoft as my example in this walkthrough. There is a copy available for each version of SQL Server which you can get here from CodePlex.
To start, navigate to the Power Query tab, and click on from database to connect to our source.
Select from SQL Server Database because that’s what we will be using. Enter the server and database to connect. You’ll notice there is a place for a SQL statement which allows you to skip the next few steps by defining the base for your query right here. This would mostly be used by people who already know SQL to customize their data source.
When we select OK, Excel creates a navigator tab on the right side of your screen with the available tables. Hovering over a table displays the columns and a data preview of the table data.
If you have foreign key constraints on your database, Power Query will automatically add the connected tables. AdventureWorks, by default, has foreign key constraints, but I have dropped them so that later I can demonstrate how to merge datasets.
Double clicking the employee table displays a query editor. Here we can remove or merge columns, transform data and append other queries. The benefit of doing this within the query editor, instead of after the data is brought into Excel, is that when the data is refreshed from the source the changes will persist within the query.
As you can see above, I’ve removed the middle name column, merged first and last name into a column, and renamed it to first name. These steps are listed in order in the applied steps window, and they can be undone by selecting the step and clicking the X. Once we are done modifying our data, we can click apply and close to create a new Excel sheet with our table.
You’ll also notice that it saved the query so that it can be reused in other places by simply selecting it from the workbook queries pane. We’ll follow the same steps and create another table for Sales Territory.
What if we want the Sales Region column from sales territory to be listed along with the employee full name? You will notice that the Employee table and Sales Territory table have a column in common. We can use Power Query to merge data based on the values in these common columns.
Select Merge on the Power Query tab and a popup appears that lets you design your merge. Select the two tables you want to combine, and the column on which the join should happen. Here we’ve selected DimEmployee and DimSalesTerritory, and we are joining on the SalesTerritoryKey. If you only want to include rows that have matches in both tables, select only include matching rows. In our case we only have matching rows anyways, but if you did not, it would keep the rows from the first table, and produce empty cells in the rows where there was no match. Merge does not require a connection to a data source. You can even perform a merge on two tables that you generated manually.
When you select OK, a query editor window appears like before with the employee data, but this time there is a column at the end called new column. This column contains the information from the SalesTerritory table. Click the icon next to the column name to expand the column and select the columns you want.
In this case, I’ll remove all the columns from the employee table except full name, and then select Region, Country, and Group from our new column. After filtering out the rows where region was N/A, the result looks something like this.
With Power Query, the ability to import, modify, and combine data is available to anyone who has basic Excel skills. There’s no need to write complicated code because the graphical query designers allow you to point and click your way to merged datasets. With no cost, if you already have Excel, Power Query can be an effective tool for beginning data exploration.
Written by Guest Blogger, Danny Tipton (Consultant at StatSlice Systems)
You can subscribe to our RSS feed.