Office Space: Tips and Tricks for Scripting Microsoft Office Applications

Office Space

Welcome to Office Space, the column that offers tips and tricks for scripting Microsoft® Office applications. We’ll post new tips every Tuesday and Thursday; to see an archive of previous tips, visit the Office Space Archive. And if you have particular questions about Microsoft Office scripting, feel free to send them to scripter@microsoft.com (in English, if possible). We can’t promise to answer all the questions we receive, but we’ll do our best.

Using ADO to Query an Excel Spreadsheet

A month or so ago CBS aired the final episode of the beloved sitcom Everybody Loves Raymond, a show that had been ranked No. 1 in viewership for hundreds of years, despite the fact that at least one of the Scripting Guys never caught even the briefest glimpse of the program. (Of course, cultural literacy isn’t exactly this Scripting Guy’s forte: he’s never seen American Idol or Survivor either, and has absolutely no idea which one is Nelly and which one is P Diddy.) With Raymond gone CBS has a void to fill in its schedule, but the rest of the world has an empty spot to fill as well: now who are we all supposed to love?

Personally, we’d like to nominate Microsoft Excel as a candidate to fill that void. After all, what’s not to love about Excel? Do you need a traditional spreadsheet program, something that can help you with budgets, accounting and other financial-type things? Excel can do that. Do you need to make charts and graphs? Excel can do that. Create an organizational chart? Calculate statistics? Excel has you covered; Excel can do anything.

Of course, some of you might not be convinced of that. “Excel can do anything?” you muse. “Well, how about this: can Excel function as a flat-file database? Can you use ADO to access the information found in an Excel spreadsheet? Can you send SQL queries to Excel and get back information?”

We’ll put it this way: the Office Space column is published on the Internet, and obviously something couldn’t be published on the Internet if it wasn’t 100% true. Of course Excel can function as a flat-file database that can be accessed using ADO and SQL queries. And in today’s column we’ll explain how to do just that.

Let’s take a look at a very simple spreadsheet and then explain how we can access the data using ADO (ActiveX Data Objects). Here’s a rudimentary spreadsheet – named C:\Scripts\Test.xls – that consists of two columns, one labeled Name, the other labeled Number. To better ensure that your database queries work against an Excel spreadsheet, make sure your spreadsheets are set up in a similar fashion; that is, make row 1 a header row, start the data itself in row 2, and don’t skip any rows or columns. And to make coding easier, don’t include blank spaces in your headers; for example, use a column header like SocialSecurityNumber and don’t use a column header like Social Security Number. It just makes life much easier. (Incidentally, that’s true when using any kind of database, not just Excel.)

Here’s what our spreadsheet looks like:

Microsoft Excel

And, yes, we did put a lot of time and effort into creating this sample spreadsheet. All part of the service we offer.

But now for the big question: how do we access this data using ADO? Well, one way is to use a script that looks like this:

On Error Resume Next

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001

Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "Data Source=C:\Scripts\Test.xls;" & _
        "Extended Properties=""Excel 8.0;HDR=Yes;"";" 

objRecordset.Open "Select * FROM [Sheet1$]", _
    objConnection, adOpenStatic, adLockOptimistic, adCmdText

Do Until objRecordset.EOF
    Wscript.Echo objRecordset.Fields.Item("Name"), _
        objRecordset.Fields.Item("Number")
    objRecordset.MoveNext
Loop

We should probably start by pointing out that the first half of the script simply defines some constants and creates the two objects – ADODB.Connection and ADODB.Recordset – that are required to connect to and retrieve data from a data source. This is largely boilerplate code that you use as-is in any ADO script. And because this is boilerplate you can use as-is we won’t discuss the first half of the script in any detail; for more information you might take a look at the Working with Databases section of the Microsoft Windows 2000 Scripting Guide.

We’ll begin our discussion with this bit of code, which actually opens a connection to the Excel spreadsheet:

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "Data Source=C:\Scripts\Test.xls;" & _
        "Extended Properties=""Excel 8.0;HDR=Yes;"";"

Again, this is largely boilerplate code. The only section you need to worry about is the Data Source, where you specify the path to your spreadsheet.

Note. What if the path to your spreadsheet includes spaces, something that can cause problems for ADO scripts working with text files (for more information, see this Hey, Scripting Guy! column)? In this case, there’s no problem whatsoever; just write out the entire file path, spaces and all:

Data Source=C:\Scripts\My Spreadsheet.xls

Incidentally, you should resist the temptation to change Excel 8.0 to whatever version of Excel you happen to be running on your machine. In this case the Excel 8.0 refers not to your version of Excel but to the ADO provider used to access Excel. Leave the provider as Excel 8.0 and everything will be fine.

As long as we’re on the subject we should also mention that the code HDR=Yes simply indicates that our spreadsheet has a header row; if our spreadsheet didn’t have a header row we’d set HDR to No. But seeing as how we said you should always have a header row and seeing as how people always do what the Scripting Guys tell them to, well, this is a moot point.

After making a connection to the data source we can use a SQL query to retrieve the information stored in that data source. Here’s the code we use to return a recordset consisting of all the rows in the spreadsheet:

objRecordset.Open "Select * FROM [Sheet1$]", _
    objConnection, adOpenStatic, adLockOptimistic, adCmdText

Again, there’s a lot of ADO boilerplate in here; the only part we’re really concerned about for now is the actual SQL query:

Select * FROM [Sheet1$]

This is a standard SQL query, one that selects all the fields (columns) in the database (worksheet). Notice that we specify the name of an individual worksheet in the query, the same way that we would specify an individual table name were we connecting to a database. Notice, too, that the worksheet name is enclosed in square brackets and that the actual name – Sheet1 – has a $ appended to it. Make sure you do both those things when writing your own ADO scripts for accessing data in a spreadsheet.

The recordset returned to us is exactly the same sort of thing we’d get back were we making a connection to, say SQL Server. Because of that we can use these lines of code to simply report back the Name and Number for each record in the recordset, something that corresponds to each row in our spreadsheet:

Do Until objRecordset.EOF
    Wscript.Echo objRecordset.Fields.Item("Name"), _
        objRecordset.Fields.Item("Number")
    objRecordset.MoveNext
Loop

When we run the script we should get back a report that looks like this:

A 1
B 1
C 2
D 2
E 1
F 1

Yes, very cool, but some of you are skeptical: after all, couldn’t we retrieve this same information using a plain-old Excel script? Yes, we could, which means that in this case, there’s really no reason to use ADO at all.

So then why did we even bother writing this column? (That, by the way, is something we get asked pretty much any time we write a column.) Well, we admit it: if all you want to do is echo back every row in the spreadsheet there’s no point in using ADO (unless you’re more comfortable writing ADO scripts than you are writing Excel scripts). However, suppose we wanted to echo back only the rows where Number was equal to 2. Using a standard Excel script we’d need to examine each row, determine whether the Number column was equal to 2, and then echo (or not echo) the row based on that value. This isn’t necessarily hard, but it can be a bit cumbersome, especially if you need to look at values in more than one column (for example, if you were looking for all users with the title Administrative Assistant who work for the Finance department).

With ADO, by contrast, we can get back this same information without having to individually examine each row in the spreadsheet. In fact, we can use pretty much the exact same script we just showed you; all we have to do is modify the SQL query. Because we want only those rows where the Number is equal to 2 we just need to use a SQL query like this:

objRecordset.Open "Select * FROM [Sheet1$] Where Number = 2", _
    objConnection, adOpenStatic, adLockOptimistic, adCmdText

That’s it; the preceding query will give us back data like this:

C 2
D 2

Why use ADO? There’s your answer: because you can easily use SQL queries to return a subset of the information found in the spreadsheet. That’s why you might want to use ADO to access an Excel file.

By the way, here’s a complete script that returns just the rows where the Number column is equal to 2:

On Error Resume Next

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001

Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "Data Source=C:\Scripts\Test.xls;" & _
        "Extended Properties=""Excel 8.0;HDR=Yes;"";" 

objRecordset.Open "Select * FROM [Sheet1$] Where Number = 2", _
    objConnection, adOpenStatic, adLockOptimistic, adCmdText

Do Until objRecordset.EOF
    Wscript.Echo objRecordset.Fields.Item("Name"), _
        objRecordset.Fields.Item("Number")
    objRecordset.MoveNext
Loop

Before you ask, yes, there are other things you can do with ADO besides simply retrieve data. In a future column we’ll show you how you can use ADO to write data to a spreadsheet.

Hey, is it any wonder why everyone loves Excel?