StatSlice Business Intelligence and Analytics | Business Intelligence and Analytics Dallas | Business Intelligence Training Dallas

Leveraging Google Docs with SSIS

Today, I’m going to show you how to access data from a Google Doc spreadsheet in SSIS so you can dump it to a table or use it in your ETL processing.

The idea for this came about when I had some metadata that needed to be updated by the client fairly regularly. For simplicity, we’re looking at banding together people based on ages. Google spreadsheet offers a simple web-based way for anyone with a web browser to update these age groups.

 

Installing the Google Data API

First, we need to download the API SDK itself:

The latest version can be found here:

http://code.google.com/p/google-gdata/downloads/list
jim-ssis-1Once installed, you’ll need to register the api with windows GAC so that SSIS can “see” the API. You can do this by dragging the DLLs

from C:\Program Files (x86)\Google\Google Data API SDK\Redist

to C:\Windows\assembly\

You’ll only need these 3 to connect to a spreadsheet:

• Google.GData.Extensions.dll
• Google.GData.Extensions.dll
• Google.GData.Spreadsheets.dll

 

 

 

That should register the DLL’s automatically, however if you have issues, try a reboot or Google how to manually register a DLL into the GAC.

 

Creating the Spreadsheet

In this example, we’re creating a spreadsheet with 3 different fields:

AgeName — the grouping displayed on a report
LowerBound — the lower bound of the Age group
UpperBound — the upper bound of the Age group

jim-ssis-2

 

There is one very important piece of information you also need to grab from the spreadsheet: the key. You can find it by looking in your address bar and grabbing the long text between key= and #gid.

In this example, the key is “0Algddiszrz0mdFo0akcwbEhFVVhIaGRFX3M4MF84LWc”.

 

Connecting SSIS to the Spreadsheet
Next, we’re going to create a SSIS script component that will connect to the spreadsheet and output the rows it finds. In BIDS, create a new package with a single dataflow in it and edit that dataflow.

jim-ssis-3Drag in a script component to the dataflow, and you’ll be presented with a component type dialog. Select source as the script type, and press OK.

 

 

 

 

 

 

You should now have a script component on your screen; double click it to edit it. Select Inputs and Outputs on the left, and as per our example spreadsheet, add 3 columns to output 0:

Name: AgeName DataType: string [DT_STR]
Name: LowerBound DataType: four_byte_signed integer [DT_I4]
Name: UpperBound DataType: four_byte_signed integer [DT_I4]

It should look something like this:

jim-ssis-4

 

 

 

 

 

 

 

 

 

 

 

 

The Code

jim-ssis-5Now that we have the Script component setup, we can actually edit the code and add what we need. First, remember to add references to the Google Data API DLLs we installed earlier (Client, Extensions, and Spreadsheet). You can do this by right clicking References and choosing Add Reference.

 

 

 

 

 

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Google.GData.Client;
using Google.GData.Extensions;
using Google.GData.Spreadsheets;
using System.Windows.Forms;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
SpreadsheetsService service;
WorksheetQuery query;
WorksheetFeed feed;
AtomEntry sheet;
AtomLink link;
ListQuery listQuery;
ListFeed listFeed;

public override void PreExecute()
{
base.PreExecute();

service = new SpreadsheetsService(“ETL Pull”);
//Username and password
service.setUserCredentials(“someone@google.com”, “*******”);
//your spreadsheet key goes in next line
query = new WorksheetQuery(“0Algddiszrz0mdFo0akcwbEhFVVhIaGRFX3M4MF84LWc”, “private”, “full”);
feed = service.Query(query);
sheet = feed.Entries[0];
link = sheet.Links.FindService(GDataSpreadsheetsNameTable.ListRel, null);
listQuery = new ListQuery(link.HRef.ToString());
listFeed = service.Query(listQuery);

}

public override void PostExecute()
{
base.PostExecute();
}

public override void CreateNewOutputRows()
{

foreach (ListEntry row in listFeed.Entries)
{
if (row.Elements[0].Value == “”) break;

Output0Buffer.AddRow();
Output0Buffer.agename = row.Elements[0].Value;
Output0Buffer.lowerbound = Int32.Parse(row.Elements[1].Value);
Output0Buffer.upperbound = Int32.Parse(row.Elements[2].Value);
}
}
}

 

Explanation of the Code

The two methods you should pay attention to are PreExecute() and CreateNewOutputRows().

PreExecute creates a connection to the Google spreadsheet, logs in with the username and password passed in, and eventually creates the listFeed object which we use to actually read the data.

Note: By accessing the spreadsheet this way, the listFeed object will automatically skip the first row in the spreadsheet so you don’t need to handle skipping it in your processing code.

CreateNewOutputRows simply goes through every row in the spreadsheet and adds a row to the SSIS components output.

Note: Line if (row.Elements[0].Value == “”) break; will stop processing if it even hits a row with no data in the AgeName column. I added this to protect the SSIS compenent from user error.

 

Where to Go From Here
Hopefully, you’ve learned how to bring Google spreadsheet data into SSIS. There are tons of possibilities you can add to extend this. I’ll list a few examples:

 

Security:
Now that the data is in Google spreadsheets, you can control who has access to the document quite easily and share it /remove access right from google docs.
You can also lock down the sheet or add input verification to the sheet for your client.

More worksheets:

feed.Entries is actually the list of all worksheets in a spreadsheet. You can extend this to work with multiple sheets/output as needed.

Web Forms:

A Google spreadsheet can be populated from a web form instead of giving access directly to the spreadsheet. As a quick run through on how, take a look here: https://support.google.com/drive/answer/87809?hl=en

Variables:

You can embed your credentials, the spreadsheet key, and other things into a SSIS variable instead. That way, if you change the spreadsheet you want to pull from or the user account to use depending on deployment (DEV, TEST, PROD), you can do so just by swapping a config.

 

Thanks for reading and look for my next post!

James Vogel

James Vogel

James Vogel is a Senior Consultant at StatSlice Systems. With over 10 years of consulting experience, his current focus is on ETL development, data modeling, and end user reporting tools. Jim finds enjoyment in teaching and getting end users involved with their data and reporting systems. He presents demos and trains clients on the current BI tools that StatSlice offers. He is focused on the delivery of applications that drive business decisions that blend the use of OLAP, ETL, and custom development. Jim has worn many hats on projects, from technical advisor to lead developer to trainer.

Subscribe

You can subscribe to our RSS feed.