[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

Defining Excel Functions without Visual Basic: a compiler that converts Excel function definition sheets to VBA

How easy would Java programming be if you had to define all your methods in Fortran? Excel does not allow you to define your own functions, unless you do so in Visual Basic for Applications (VBA), a language with a different syntax and semantics from Excel formulae. So I've been experimenting with a "function compiler", that reads function definitions written as Excel formulae from a worksheet, compiles them into VBA, and inserts this into your spreadsheet. I'm going to demonstrate below, using a simple relativistic calculation as one example. This, incidentally, is purely functional programming, although it uses two-dimensional positions on a worksheet, rather than names, to distinguish intermediate calculations.

To explain quickly, here's an example. The screen shot below shows part of an Excel spreadsheet, displaying a worksheet I've named MyFunction:

I intend this to define a function named Foo, with 2 arguments. So I've put my function's name into cell B1, and the number of arguments into C1. I shall run an Excel macro (that is, a subroutine written in Visual Basic for Applications), which will call my function compiler and create a new function from this information.

Rows 2 and 3 represent Foo's arguments. Following them, row 4 represents Foo's result. The crucial part is cell B4, which as you can see from the formula bar, holds the formula =B2+(B3/1000).

To turn this definition into code, the macro first saves the worksheet to file. It then runs the function compiler (written in Prolog), which reads the file and extracts the contents of each cell. It uses the top row to make a VBA function header, like this:

Public Function Foo(var_1 As Double, var_2 As Double) As Double
Here, var_1 and var_2 are names it has generated for the function's formal parameters.

The compiler then counts down two rows to find the result formula =B2+(B3/1000). It replaces "B2" by "var_1" and "B3" by "var_2". With the resulting expression, it makes a VBA function body:

Foo = var_1 + var_2 / 1000

And it then appends a VBA function terminator. The resulting VBA is:

Public Function Foo(var_1 As Double, var_2 As Double) As Double
  Foo = var_1 + var_2 / 1000
End Function

And finally, the function compiler returns to the macro that called it, which inserts the code into a new VBA code module. In general, self-modifying code is a Bad Thing — Dobbs Code Talk readers won't need persuading of its potential for bugs and lack of readability — but in this case, the ability to generate code at runtime and add it to the spreadsheet is very useful.

And now, I can call my function from a formula. This screen shot shows part of sheet Sheet1, with a formula in A1 that calls Foo:

The technique extends to function definitions that combine partial results calculated in different cells. Here's an example: an unusual one for Excel, because it's a relativistic calculation of four-dimensional distance. I've chosen it because it uses a simple formula, but one we can break into parts that can be evaluated and annotated separately in different places on the worksheet.

By the way, you can also use it to impress your friends. In college, a friend once spotted me reading Taylor and Wheeler's Spacetime Physics, riffled through the book, and asked me to do the first exercise he saw. This was the calculation of some four-dimensional distance — say between two events a week apart. He was most impressed when I could do so in my head. To some people, the Fourth Dimension is still as mystical as it was to the Victorian spiritualists. But in fact, this aspect of space-time geometry is very simple.

In everyday life, we can calculate the distance between two points in three-dimensional space. Call the differences in their x, y, and z coordinates x, y, and z. Then this distance is:

sqrt( x2 + y2 + z2 ).

Relativity treats time as a dimension, and lets us extend the above expression to calculate the distance between two events separated by time as well as space. Call the differences in their x, y, z, and t coordinates x, y, z, and t. Then this distance is:

sqrt( x2 + y2 + z2 - (ct)2 ).
Here, c is the speed of light: it converts time units to space ones, giving (ct)2 the same units as x2, y2, or z2. The minus sign is because, after all, time isn't exactly the same kind of thing as space, as any book on the geometry of space-time will explain.

So imagine that a tourist lets off his camera flash at midday in the Royal Observatory, Greenwich. (In the Royal Observatory, there are always tourists.) One second later, a little way up the staircase in the Flamsteed House, a second tourist lets off his flash, one metre north, one metre east, and one metre above him. What is the four-dimensional distance between these flashes? Here's a sheet that calculates it:

This sheet has the same kind of layout as my Foo example. Row 1 names the function in B1, and states its number of arguments in C1. Rows 2 to 5 represent the function's arguments. Row 6 represents its result. The crucial part is cell B6, which points at B27. This is the end of a chain of calculations which you can see annotated above, starting at B15 and B18. The above screen shot doesn't show the actual formulae in the cells, so here they are, in a listing provided by the function compiler:

Cells
  B1:C27
where:
  B1 = "SpaceTimeDistance"
  C1 = 4
  B2 = 1
  B3 = 1
  B4 = 1
  B5 = 1
  B6 = B27
  B15 = 299792458
  B18 = (B15*B5)^2
  B21 = B2^2+B3^2+B4^2
  B24 = B18-B21
  B27 = B24^0.5

Getting back to relativity for a moment, although the sheet is treated by my function compiler as a function definition, it is also performing a calculation on the argument values in cells B2 to B5. So you can see it as a test or demonstration of the function. The four-dimensional distance between the two flashes is, it tells me, 299,792,458 metres. If you want to verify this, the spreadsheet is linked from the screen shot above. This is almost the same distance that light would travel in a second, and is almost entirely due to the (ct)2 term in the formula. In comparison, the 1 metre separations are negligible.

By the way, I have inverted the sign of the formula in B24. Otherwise, with the argument values I gave, B24 would be negative, and B27, its square root, would be an imaginary number. That makes physical sense in such calculations, but Excel can't do complex arithmetic, so would report an error. The only imaginary numbers it knows are those deficted by accountants.

Now let's return to the function compiler. This analyses this sheet in the same way as it did the Foo example. It generates names var_1 to var_4 for the function's formal parameters, and makes a VBA function header with them. It then translates the formula in B6 into a VBA expression giving the function's result. But this time, B6 refers to B27, which refers to B24, which refers to B18 and B21... So the compiler follows these cell references, extracts their formulae, and substitutes them in place of the references. This gives the following:

Public Function SpaceTimeDistance(var_1 As Double, var_2 As Double, var_3 As Double, var_4 As Double) As Double
  SpaceTimeDistance = ((299792458 * var_4) ^ 2 - (var_1 ^ 2 + var_2 ^ 2 + var_3 ^ 2)) ^ 0.5
End Function

And now, I can call SpaceTimeDistance from a formula, as this screen shot shows:

The function compiler is a prototype, and originated in experiments I did in 2004 in compiling spreadsheets to Prolog, which I wrote up rather roughly here. That, I believe, was inspired partly by a prospective customer who wanted me to compile their spreadsheet to Prolog, and partly by the paper A User-Centred Approach to Functions in Excel by Simon Peyton Jones, Alan Blackwell, and Margaret Burnett. They ask what notation one could give Excel users for defining functions in Excel formula language, while making this easy to learn by using as few new concepts as possible.

I thought I might be able to compile this style of function definition to an SWI-Prolog program, save the program as an executable, then call this from Excel. Last week, I discovered that Excel lets me add VBA code at runtime, so it seemed worth trying to compile to VBA instead of Prolog.

As I remarked at the beginning, VBA's syntax and semantics differ from those of Excel formulae, so my compiler cannot simply copy formulae verbatim. For example, I have discovered that in VBA, the expression 100000000*100000000 provokes an overflow error. But the Excel formula =100000000*100000000 evaluates, no fuss and no bother, to 1E+16. To avoid the VBA error, one can denote its constants as reals: 100000000.0*100000000.0.

Moreover, as the VBA Help topic "Using Microsoft Excel Worksheet Functions in Visual Basic" tells us, most Excel functions need to be called through a special interface, and cells need to be referenced via instances of the Range class. So the Excel formula =MATCH(9,A1:A10,0) would translate to something like:

Application.WorksheetFunction.Match(9, Worksheets(1).Range("A1:A10"), 0)
Here, the "Worksheets(1)" names the worksheet holding cells A1:A10.

More fundamentally, one would need to translate IFs specially, be careful about the evaluation time of arguments to some functions, and also be careful about error-handling. It may not be possible to translate every formula exactly. Nevertheless, the function compiler could be very useful. For example, as a "helper wizard" to help novices learn VBA by building VBA function definitions. It would be interesting to combine this with the idea of spreadsheet components that I've promoted in my Spreadsheet Parts Repository.

Appendix

Should you be interested, this trace shows the function compiler's diagnostics when compiling the Foo function:

Sheet = MyFunction.
Input file = c:\windows\temp\function_compiler_in.xml.
Output file = c:\windows\temp\function_compiler_out.vba.

Contents of spreadsheet =
Cells
  MyFunction!B1:C4
  Sheet1!B3:C6
where:
  MyFunction!B1 = "Foo"
  MyFunction!C1 = 2
  MyFunction!B2 = 1
  MyFunction!B3 = 1
  MyFunction!B4 = MyFunction!B2+MyFunction!B3/1000
  Sheet1!A1 = foo(20, 30).

Compiling at left-hand corner: MyFunction!A1.

Function and arity = Foo and 2.

Formal parameter cells = [ MyFunction!B2, MyFunction!B3 ].
Generated formal parameter cell variables = [ var_1, var_2 ].

Generated function head = Foo(var_1, var_2).

Dereferenced function result = +( MyFunction!B2, /( MyFunction!B3, 1000 ) ).
Dereferenced function result with cell refs replaced by variables = +( var_1, /( var_2, 1000 ) ).

Function head in VBA syntax = 'Public Function '<>'Foo'<> ( seplist_([var_1<>' as double', var_2<>' as double'], ', ' )<>' ) as Double'.
Function tail in VBA syntax = var_1+var_2/1000.