The Next Level

Document automation in the 21st century

MS Office and Ruby

Why do it?

Why use Ruby and not VBA? That’s actually a good question. VBA is the obvious incumbent… built-in to Office and easily accessible without the necessity to install any other tools/programming languages. However, there are a number of reasons why VBA is just so much harder to use once your requirements get a little trickier. Let’s go through some of these:

Regular Expression Support

When was the last time that you tried using regular expressions in VBA? Sure it is possible but it requires that you first include references to the Microsoft VBScript Regular Expressions 5.5 library in your project. Once you do this you can instantiate a new RegExp object and test various text strings against this object looking for a match.

This is what this would look like in code:

1
2
3
4
5
6
7
8
9
Dim rx As RegExp
Dim test_string As String
Set rx = New RegExp
rx.Pattern = "<<Pattern goes here>>"
test_string = "<<String that you want to filter on based on matches>>"

If rx.Test(test_string) Then
    debug.Print test_string
End If

The equivalent Ruby code would be something like this:

1
puts test_string if /Pattern/.match test_string

So clearly Ruby wins out in terms of simplicity, its Perl heritage shining through as a killer tool in doing text manipulation.

No String Interpolation

There just isn’t any. Period. Multiline text strings are a pain also. You end up building strings by doing silly things such as:

1
2
3
4
5
Dim sql As String
Dim list_of_ids As Variant
list_of_ids = Array(2,12,44,51)
sql = "SELECT FROM " & table_name & " WHERE "  & vbCrLf & _
      "ID IN (" & IDs_as_comma_separated_string(list_of_ids) & ")"

Did you notice that we would also need a separate function to return the IDs in the Array as a comma separated string? But that’s a matter for another discussion! Did you also notice the underscore + space above ? It is a line continuation trick in VBA. Nifty, but not so nifty when you consider that you could have done this in ruby:

1
2
3
4
5
list_of_ids = [2,12,44,51]
sql =<<"END_OF_SQL"
SELECT FROM #{table_name}
WHERE ID IN (#{list_of_ids.join(", ")})
END_OF_SQL

No separate functions needed to join the arrays with a comma+space, nice multi-line strings with the HEREDOC style string declaration and complex string interpolation to boot.

Ruby wins again!!!

Hard to Use Data Types

I don’t know about you but arrays are hard to use in VBA. Once they are initialised dynamically and then set to a specific size and you need to change the size of the array doing things like Redim Preserve or Redim (if you don’t care about losing the data stored in the array up to that point).

1
2
3
4
5
6
7
8
9
10
11
12
13
Dim arr() As String                               ' <--- initialise
Dim i As Long

ReDim arr(20) As String                            '<--- ReDim to a certain size
                                                   '     How many will I need? 
Const SomeLargeValueLargerThanArrayBounds = 2000

For i = 1 To SomeLargeValueLargerThanArrayBounds
    If i > UBound(arr) Then                        '<---
        ReDim Preserve arr(i) As String            '<--- Three LOC to manage this 
    End If                                         '<---
    arr(i) = "String " & i
Next i

Whilst in Ruby, the following would apply:

1
2
3
4
arr = []
some_iterable.each_with_index do |iter, i|
    arr << "String #{i}"
end

Have you tried to sort an array recently within VBA? This is even messier! You need to bring your own BubbleSort function (or your favourite sorting algorithm) and don’t even try sorting on more than one criteria, unless you are a sucker for punishment!

There are Collections, but they are fairly complicated to use compared to what you get out of them. What about hashes or dictionaries? Again, it is possible to use these by including the Microsoft Scripting Runtime library in your project. However it still feels clumsy and overly verbose compared to Ruby.

A typical use would be as shown below:

1
2
3
Dim hash As Scripting.Dictionary
Set hash = New Scripting.Dictionary
hash.CompareMode = TextCompare

then later, assuming that we just want to return a dictionary containing the matches as keys with the length of the strings as values

1
2
3
4
If Not hash.Exists(thing) Then
    hash.Add Key:=thing, _
             Item:=Len(thing)
End If

Notice also that you can’t have hash.Add(Key:=thing, Item:=Len(thing) with the parenthesis!!! VBA is very confusing when it comes to the use of parenthesis/brackets around the argument list. For example, when calling a function which does not return a value, we do not use brackets around the argument list. However, if we were returning a value that is being assigned to another variable then we must use brackets around the argument list. When calling a subroutine we don’t use parenthesis, unless we are calling the subroutine by using an explicit Call. The .Add instance method of a Dictionary isn’t returning a value so it must not use brackets around the argument list. Confused yet?!?

Thankfully, looping through the Dictionary to return the keys and values is pretty straight forward:

1
2
3
4
Dim my_val As Variant
For Each my_val in hash.Keys
    debug.Print my_val, hash(my_val)
next my_val

In Ruby, the above would easily be simplified to:

1
2
3
4
5
hash = {}
# Loop through a whole bunch of code and update thing
hash[thing] = thing.size unless hash.include?(thing)
# keep looping until done
hash.each { |my_key, my_val| puts "#{my_key}\t#{my_val}" }

Dictionaries (called hashes by Rubyists) are great in Ruby!!! In Ruby 1.9 they even maintain the order of the elements in the hash (which will come in handy when we automate the generation of a Gant Chart from a YAML file in an up and coming post).

Arrays are also a pleasure to use in Ruby! You can dynamically add and remove elements from the array without having to write supporting boilerplate code (as you do in VBA with Redim Preserve). You can also mix data types! Then there are sets which allow for the quick implementation of algorithms returning unique values from an array. It is straight forward to sort elements in the array and it is not hard to do so for multiple array elements. It is possible to reverse, join and do all sorts of useful things to arrays in Ruby, either in-place or to a copy of the original array. This is an area where Ruby clearly wins hands down! Ruby code is much more succint and expressive than VBA and working with data structures (even relatively complex ones) is made easy!

Difficult Code Re-use

In VBA code within MS Office, code re-use is facilitated by the use of Templates, Add-Ins, etc. However, quite often, when one isn’t specifically working on building templates and Add-Ins, one-off macros tend to reside in the same spreadsheet where you needed to implement them in order to meet that important deadline. Trying to figure out where these were stored some time later in the future is a struggle, unless you are very methodical and suffer a little from obsessive compulsive disorder (OCD)!

In Ruby, it is easy enough to generate modules or classes in order to re-use code. Using code generated by the Ruby community is easy with the use of gems. Installing gems is dead easy wiht the “gem install” command. However, it would be a good idea to ensure that the scripts are kept in the same folder as the native Office document that is associated with them, unless the script was a throw-away script that isn’t going to be required some time later. Even then, there is an option to search through the scripts as these are plain text files. Naturally, it is a bit harder to do the same against, say, an Excel binary or even against a hierarchy of XML files in the newer versions of Office.

Typical Example

Let’s apply this to the following example. Suppose that I have a workbook with two worksheets, Fruits and Results; one containing a list of Fruits and another containing the results of running a regular expression filter on those fruits that match a specific criterion. Suppose that we are specifically looking for fruits that start with B and end in y.

This is a contrived example as one wouldn’t even need to use Excel to implement a solution to this, but please just go along with me for the time being…

Based on what we covered above, the very first step in writing the macro solution to this problem would be to include the specific references to the two libraries required:

  • Microsoft Scripting Runtime
  • Microsoft VBScript Regular Expressions 5.5

This is what our VBA macro listing looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Option Explicit
'------------------------------------
Public Sub ConTheFruiterer()

Dim xls As Excel.Worksheet
Dim EOF As Long
Dim hash As Scripting.Dictionary
Dim rx As RegExp
Dim test_string As String
Dim i As Long

Set hash = New Scripting.Dictionary
hash.CompareMode = TextCompare

Set rx = New RegExp
rx.Pattern = "^B.*y$"

Set xls = ThisWorkbook.Worksheets("Fruits")
EOF = xls.Range("B1048576").End(xlUp).Row

For i = 1 To EOF
    test_string = xls.Range("B" & i).Value
    If rx.Test(test_string) Then
        If Not hash.Exists(test_string) Then
            hash.Add Key:=test_string, _
                     Item:=Len(test_string)
        End If
    End If
Next i

Dim my_val As Variant
Dim row_id As Long
row_id = 0
Set xls = ThisWorkbook.Worksheets("Results")
' Clear the results of a previous run
xls.Range("A1").CurrentRegion.Clear
For Each my_val In hash.Keys
    row_id = row_id + 1
    xls.Range("A" & row_id).Value = my_val
    xls.Range("B" & row_id).Value = hash(my_val)
Next my_val

End Sub

Everything above is pretty standard apart from the method that we employed to find the last used row in the ‘Fruits’ worksheet:

1
EOF = xls.Range("B1048576").End(xlUp).Row

This is the canonical MS way of finding the last used row and it is similar to the way that we do it directly from Excel. Think of it as navigating to the bottom of Column B and with the cursor in that position, doing a control up-arrow to get to the last used row. This is exactly what we will be doing in our Ruby solution as well.

When run, we end up with the following:

The Ruby Way

For all its expressiveness and built-in text wrangling tools, writing scripts in Ruby to manipulate Office documents does have some disadvantages. Let’s quickly go through these:

No VBE to develop the code in

Actually, I use Sublime Text 2 and prefer to use it any day compared to the Code Editor IDE built-in to Office applications (i.e. the VBE). However, you will be without code completion and will have to keep two or more separate files for the solution, which sort of brings me to the next point…

Not Integrated

You won’t be able to make use of the Form widgets available in Office to build custom applications. Hence, using Ruby lends itself to applications where some high powered analysis of the data in the worksheet is required or where you will run a script to auto-generate a document, powerpoint or project schedule from a number of source files. These are the sort of applications that we will cover at length in this forum.

Impedance Mismatch

You will need to think in Ruby but, at the same time, be constrained by the OLE ecosystem. Your code won’t look like pure Ruby code with pure snake_case variable declarations and method definitions. The OLE methods will be CamelCased. Rather than starting coding your solution straight away, there will be some extra steps associated with calling the win32ole library, instantiating the appropriate Office Application, etc. Fortunately, Ruby syntax is not so dissonant from VBA syntax as to make the whole premise unviable. For instance, instance and class methods use the dot notation and named parameters in VBA can be easily emulated with Ruby 1.9’s new hash notation.

Let’s look at this VBA snippet to add a new task to a project schedule in MS Project:

1
tsk = proj.Tasks.Add(Name:= task_name, Before:= at_row)

Note the Ruby equivalent:

1
2
3
4
5
# N.B.: proj set to the current project
# Ruby 1.8 hashrocket syntax
tsk = proj.Tasks.Add('Name' => task_name, 'Before' => at_row)
# Ruby 1.9 new hash syntax
tsk = proj.Tasks.Add(Name: task_name, Before: at_row)

VBA Constants are harder to use

Actually, they are but only a little if you use this technique…

  1. Start off by creating a class to hold the constants
  2. Load the constants in the class
  3. Use the constant

Note that you will need to reference the constant with reference to the class. The constant will need to be capitalised in order to work in with Ruby (where all constants are capitalised). Hence xlUp becomes ExcelConst::XlUp.

1
2
3
4
5
6
# N.B.: app previously set to the Excel Application
class ExcelConst
end

WIN32OLE.const_load(app,ExcelConst)
eof = sht.Range("C65536").End(ExcelConst::XlUp).row

Ok, so let’s translate the above VBA macro into Ruby. We’ll start with the standard boilerplate which is going to be part of each Ruby script accessing MS Excel via OLE.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# --- Boilerplate required to load win32ole library and initialise Excel
require 'win32ole'

begin
    app = WIN32OLE.connect('Excel.Application')
rescue
    app = WIN32OLE.new('Excel.Application')
end

app.Visible = true
app.DisplayAlerts = false

class ExcelConst
end

WIN32OLE.const_load(app, ExcelConst)

# --- End of Boilerplate

Please note with regards to the above:

  • Line 2: This is where we require the win32ole library which makes it all possible
  • Lines 4 to 8: We either launch a new Excel process or attach ourselves to an existing one
  • Lines 10 and 11: Some sane defaults while we are starting out as it is reassuring to launch Excel and see it up and running on our screen
  • Lines 13 to 18: Importing Constants. Note that in this case we could habe just used a magic number for a quick hack, but it is preferably to get into the habit of doing this

Ok, now it is time for Ruby to start earning its keep. The rest of the code is an approximate match to the previous VBA code but in Ruby.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
wb = app.Workbooks.Open('C:\Users\rdapaz\Desktop\ConTheFruiterer.xlsm')
xls = wb.Worksheets('Fruits')

eof = xls.Range("B1048576").End(ExcelConst::XlUp).row
hash = {}

(1 .. eof).each do |row|
    test_string = xls.Range("B#{row}").Value
    if /^B.*y$/.match test_string
        hash[test_string] = test_string.size unless hash.include? test_string
    end
end

counter = 0
xls = wb.Worksheets('Results')
hash.each do |k, v|
    counter += 1
    xls.Range("A#{counter}").Value = k
    xls.Range("B#{counter}").Value = v
end

A couple of things stand out:

  • Line 7: the use of a block to iterate through the rows in the (1 .. eof).each bit
  • Line 9: the baked-in use of regular expressions in /pattern/.match test_string
  • Line 10: the use of the keyword unless in place of if not
  • Line 10: the fact that unless can be used as a statement modifier
  • Lines 16 to 20: the ease of iteration over the contents of the hash

It should be said that with this very trivial example, the line count of both solutions is roughly the same. This is due, however, to the fact that we needed to deal with the WIN32OLE boilerplate code in the ruby code. The actual code associated with the business logic is much more succint in the Ruby version. With more complex reports, you will probably find that the Ruby version will be significantly shorter than the VBA version.

In summary, VBA provides some very good tools for customising and extending Ms Office applications. However, by chosing to implement certain solutions in a standalone text editor using scripting languages such as Ruby, one tends to be much more succint and expressive in problem domains such as text manipulation, albeit, at the loss of facilities such as code completion, syntax highlighting, etc. By using a good text editor such as Sublime Text 2 (or EmEditor, Notepad++, etc.) and the techniques demonstrated above, it is possible to minimise the impact of losing the native IDE, whilst leveraging off the strengths or Ruby such as a wide collection of re-usable code packaged as gems and baked in features such as regular expression, hashes, sets, etc.

That’s all for now. I hope that you have enjoyed my very first blog and please keep coming back for some more updates…

Comments