If you ever needed to query Excel tables (and who hasn’t) from your VBA programs that contained thousands of rows, or join multiple tables in the process – this is the Blog post you’ve been waiting for * Write complex SQL queries over Excel tables as if they were stored in a relational database
Do SQL and Excel go together?
SQL is the most popular Databases language. It is robust, proven and works with extremely large tables stored in any common relational Database software. If you want to learn more about Databases and SQL – read my last week’s Blog post here.
Excel is often used to store data in a table-like fashion: columns with headers and many rows below the columns’ headers.
If we have Excel tables (or ranges of cells arranged like tables) and we have SQL that is designed to query tables – why not use them both for better results?
Achieve unheard-of results with SQL in your VBA programs
Did I just say better results? Well, that was a polite way of saying: new possibilities are now open for you.
Well, to be accurate, I must say that technically speaking, you can do anything you want with VBA over Excel tables. However, going about quickly aggregating a 4-table join, processing thousands or tens of thousands of records – is not something I recommend doing without SQL (yes, even if using smart data structures and programming techniques, such as arrays).
What can be achieved in a single, well crafted SQL statement, may require hundreds of lines of VBA code, and the SQL statement will most probably perform much better.
What makes SQL available in VBA
Excel inherited the Jet engine developed by Microsoft for MS-Access. The Jet is a fair Databases connectivity and SQL engine, albeit not fully compliant with the SQL ANSI standards. The Jet is no longer developed and its latest version is 4.0.
Since MS-Office 2007 (.xlsx file extension), the Ace engine replaced the Jet engine (latest version: 16.0). The main benefits of the Ace over the Jet engine, are support for MS-Office 64bit, support some complex data types, encryption and security, and SharePoint and MS-Outlook integration.
However the ACE engine is backward-compatible, some inconsistencies may arise. I always suggest to start off with the Ace engine version 16, revert down to Ace version 12, and then (if not on MS-Office 64bit) default to the Jet engine.
The Jet engine is part of the Windows installation package. If Ace isn’t, download it here: https://www.microsoft.com/en-us/download/details.aspx?id=54920 (chose either the 32bit or the 64bit version to match your MS-Office installation (I said MS-Office – not MS-Windows!).
What you need to know
Although I cover in great detail VBA, relational Databases and SQL in my online courses, I will assume here that you know SQL and that you know VBA.
The rules you need to know about using SQL over Excel tables through the Ace/Jet engines, are:
- Data type of a column is determined by the engine based on the values in the first (8 by default) rows in that column. If you have sparsely filled out columns – explicitly fill them with something (zeros, for example, for numbers), or set the IMEX=1 property (more on that later).
- Tables are either a range of Worksheet cells or a Worksheet.
Refer to a Worksheet (requires that the “UsedRange” property resolve to cells arranged as a table) named “Past Invoices” using this syntax:
Refer to a range of cells within the “Past Invoices” Worksheet like that:
I always leave no room for interpretations and always specify the explicit range, except for ranges that exceed the rows limit of the Jet engine, which is 65536. In these cases, I’d leave an open ended range like that:
- Column names should be enclosed in square brackets:
- Nested joins of more than two tables must be enclosed in parenthesis:
FROM ((… JOIN … ON… ) JOIN … ON … )
- LIMIT clause becomes the TOP qualifier in the SELECT clause:
SELECT TOP 5 …
- GROUP BY must include ALL non-aggregated SELECT expressions. The first column is not enough.
- ORDER BY cannot reference aliases. Use the complete column expression as defined in the SELECT clause, or specify ordinal index in the SELECT clause:
ORDER BY 2, 4
Setting up the Connection
As with any data source, we first need to establish a connection using either the Jet engine or the Ace engine.
I typically use the Ace engine and establish an early binding to the ADODB library by adding a reference to the Microsoft ActiveX Data Objects 6.1 Library (in the VBA editor, open: Tools -> References and tick this entry).
Next, we create an instance of the ADODB Connection object in a local variable and call the Open method in order to establish the connection. We need to provide the correct connection string to the Open method – more about that shortly.
Here’s a full example of a Subroutine that establishes a connection to the current Excel Workbook (“ThisWorkbook”):
Public ConnDB as ADODB.Connection
'Open the connection to the DB Provider, if not already open
Dim strProvider As String
'strProvider = "Provider=Microsoft.Jet.OLEDB.4.0;" 'If using the old JET engine
strProvider = "Provider=Microsoft.ACE.OLEDB.16.0;"
On Error GoTo FailedConnection
If (ConnDB Is Nothing) Then
Set ConnDB = New ADODB.Connection
If Not (ConnDB.State = 1) Then
ConnDB.Open strProvider & _
"Data Source=" & ThisWorkbook.FullName & ";" & _
"Extended Properties=""Excel 12.0;HDR=Yes;"";" 'The JET engine only works with "Excel 8.0"
MsgBox "Failed connecting to Excel as a DB. Please contact support.", vbOKOnly + vbCritical, "Database Error"
Set ConnDB = Nothing
Couple of comments on that code:
- I typically define the connection variable (ConnDB) to be public, so that it is readily available to any function submitting SQL statements, as well as to the connecting Sub (above) and the disconnecting Sub.
- Before opening the connection, I verify it is not already open (State = 1 means it is open).
About the Connection String
As you can see, the Connection String has a Provider part, a Data Source part, and an Extended Properties part.
The Provider part is the engine used.
The Data Source part points to the full path and file name of an accessible Workbook. In this example I’m just taking ThisWorkbook’s FullName property
The Extended Properties allows us some control over the behavior of the engine. Note how it is in itself a string enclosed within double quotes, therefore they are escaped (“”) as they are within an already constructed string (the Connection String itself).
Here are the important parameters for the Extended Properties, delimited by semicolon (;):
- Excel file version. Excel 12.0 refers to an Excel 2007 or later. For earlier versions (used with the Jet engine) use: Excel 8.0
- HDR specifies if the first row of the range contains column headers or not.
- ReadOnly will open the Workbook in Read Only mode.
- MaxScanRows allows to override the default 8 first rows in determining the columns’ data type. Specify a number between 1 and 16. Specify 0 to consider all rows in the column.
- IMEX=1 avoids the guessing of the columns’ data types by the engine. All values will be interpreted as text (I prefer that!)
Yes, most SQL statements will work, including INSERT, UPDATE, DELETE.
However, the best dish, in my mind, is given to us in the SELECT command.
Here’s an example of constructing an SQL query that find the top 5 selling Body Care products.
We see a join between the Products table and the Order Lines table, with a SUM aggregate on the Total column in the Order Lines table.
The result is presented in the range with a top-left cell named Top_Five_Products_Anchor.
As the Ace engine does not know what is a Smart Table (ObjectList), I’m translating the tables’ range address into the proper notation valid for the Ace engine. Mainly, we need to strip off the “$” character that is reserved as the delimiter between the Worksheet name and the range itself, and enclose within square brackets.
Dim strTable1 As String
Dim strTable2 As String
Dim strSQL As String
Dim rs As New ADODB.Recordset
strTable1 = "[DB Data$" & Replace(ThisWorkbook.Sheets("DB
Data").ListObjects("ExcelarateProductsTable").Range.Address, "$", "")
strTable2 = "[DB Data$" & Replace(ThisWorkbook.Sheets("DB
Data").ListObjects("ExcelarateOrdersLinesTable").Range.Address, "$", "")
strSQL = "SELECT TOP 5 PRD.[Product Code], PRD.[Product Description], SUM(ORL.[Total])"
strSQL = strSQL & " FROM " & strTable1 & " AS PRD"
strSQL = strSQL & " INNER JOIN " & strTable2 & " AS ORL ON PRD.[Product Code]=ORL.[Product]"
strSQL = strSQL & " WHERE PRD.[Product Group]='Body Care'"
strSQL = strSQL & " GROUP BY PRD.[Product Code], PRD.[Product Description]"
strSQL = strSQL & " ORDER BY SUM(ORL.[Total]) DESC"
rs.Open strSQL, ConnDB
For simplification, I haven’t implemented error handling here (although we have that in the called Subs).
Disconnecting from the Database
Finally, here’s my version of the DisconnectDB Sub:
'Close the connection to the DB if open
On Error GoTo FailedConnection
If Not (ConnDB Is Nothing) Then
If Not (ConnDB.State = 0) Then
MsgBox "Failed closing the DB connection.", vbOKOnly + vbCritical, "Database Error"
Set ConnDB = Nothing
The above code examples are taken from my online courses: Business Applications with Excel and MySQL Database and Advanced Microsoft Excel Applications with SQL.
WANT TO LEARN PROGRAMMING WITH EXCEL VBA?
I created a complete 8-course program for you.
From programming to databases with Excel VBA to deliver business-grade solutions.
Take the first course today: Computer Programming with Excel VBA.
Click the button to see the complete program and start now!