The Watchmaker And The Spreadsheet Factory:
Compositional Spreadsheet Programming

[ Jocelyn Ireson-Paine's Home Page | Publications ]

I want to start with Herbert Simon's parable of the watchmakers:

Suppose each watch consists of 1000 pieces. The first watchmaker constructs the watch as one operation assembling a thousand parts in a thousand steps. The second watchmaker builds intermediate parts, first 100 modules of 10 parts each, then 10 subassemblies of 10 modules each, then a finished watch out of the subassemblies, a somewhat longer process, 110 steps longer.

It would seem that constructing a watch in a single sequential process would progress faster and produce more watches. Alas, life being what it is, we can expect some interruptions. Stopping to deal with some environmental disturbance, like a customer, the watchmaker puts down the pieces of an unfinished assembly.

Each time the first watchmaker puts down the single assembly of 1000, it falls apart and must be started anew, losing up to 999 steps. Interrupting the second watchmaker working on a module of 10 using hierarchical (in the first sense) construction means a loss of at most 9 steps.

For organizing complexity, the moral is this: taking a few extra steps in the short run, saves many steps in the long run.

Traditional spreadsheet programming has almost no features for building spreadsheets out of parts that can be developed and tested independently. This is unfortunate, because programmers from Simon onwards have emphasised the need for modularity in programming. This is what my EuSpRIG 2005 presentation and paper were about, and what I explain further here.

A summary for those with appropriate experience is that I have devised a functional language called Excelsior whose data types include spreadsheets. By writing spreadsheet-valued functions, one can create pieces of spreadsheet that are parameterisable over, say, varying numbers of rows. These are modules. By using a spreadsheet-composition operator, one can compose modules into complete spreadsheets which can be saved as Excel files. One can also define modules as specifications in terms of logical tables, only later mapping them onto actual rows and columns.

To start with, a bit about Excelsior as a language. It's an interactive system to which you type commands and get back replies. These commands can be formulae to be evaluated: if you type

  1+2.
Excelsior will reply with 3; if you type
  (1+2)*5.
it will reply 15. So you can use it as a desk calculator.

Excelsior knows other data types such as vectors:

  Exc ?- vector(1,2) + vector(3,4).
  Result = vector(4,6)

Amongst the things it understands are cell addresses:

  Exc ?- vector(1,2) + Sheet1!A1.
  Result = Sheet1!B3
and
  Exc ?- Sheet2!D10 - vector(2,5).
  Result = Sheet2!B5

So, in general, it's a system for interactive calculation with data types of use to spreadsheeters. These types include spreadsheets themselves, parts of spreadsheets to be used as modules, and specifications written in terms of tables rather than layout on a worksheet. To make these adaptable, they can be parameterised. Here's an example:

  let accounts_spec( Y1, Y2 ) be
    {# Profits[ ^(Y1) : ^(Y2) ]
     , In[ ^(Y1) : ^(Y2) ]
     , Out[ ^(Y1) : ^(Y2) ]
    |  Profits[ all y ] = In[y] - Out[y]
    #}.
This is an Excelsior function of two arguments, Y1 and Y2. (It's good literary style to use the subjunctive when defining functions.)

Had I written

  let add( Y1, Y2 ) be
    Y1 + Y2.
then I'd have defined a function that returns the sum of Y1 and Y2. 'add' is a very simple example of a function. However, 'accounts_spec' is more complicated. It returns a group of three tables, called Profits, In, and Out. You know they're tables because, like arrays in conventional languages, they're followed by square brackets.

They're also enclosed in brackets {# to #}, which tells Excelsior they're either a spreadsheet or a specification to be translated into a spreadsheet. Keyboard limitations mean I don't have enough single-bracket characters for everything I want, but if you combine the idea of { ... } which enclose sets in mathematical notation with # which looks like a spreadsheet grid, then think of a spreadsheet as a set of equations, it's mnemonic enough to use {# and #} as spreadsheet brackets.

In this definition, each table has a lower bound equal to Y1 and an upper bound equal to Y2 - the ^ symbol tells Excelsior to take the value of Y1 and Y2, rather than treating them as cell names, also permitted in this context. Think of ^ as an arrow symbolising "poke it in here".

After the tables, there's a | and an equation. In maths, | means "such that". So altogether, the

  {# Profits[ ^(Y1) : ^(Y2) ]
   , In[ ^(Y1) : ^(Y2) ]
   , Out[ ^(Y1) : ^(Y2) ]
   | Profits[ all y ] = In[y] - Out[y]
  #}
means "a group of tables; which are named In, Out and Profits; which have bounds Y1 to Y2; such that each element of Profits is the difference of the corresponding elements of In and Out".

This thing is a specification of calculations to be done by a spreadsheet: but like a number or a vector or a set, Excelsior regards it as a single mathematical entity. You can store it in variables, display it, save it to file, and pass it to functions.

As I said, the group is part of a function. I'll display that again:

  let accounts_spec( Y1, Y2 ) be
    {# Profits[ ^(Y1) : ^(Y2) ]
     , In[ ^(Y1) : ^(Y2) ]
     , Out[ ^(Y1) : ^(Y2) ]
    |  Profits[ all y ] = In[y] - Out[y]
    #}.
Before you can use it, you must supply values for Y1 and Y2. You do this by calling the function. With a two-year span:
  Exc ?- show( accounts_spec(2000,2001) ).
  {# In[2000:2001], Out[2000:2001], Profits[2000:2001] | Profits[2000] =
  In[2000]-Out[2000], Profits[2001] = In[2001]-Out[2001] #}.
  Result = void
or with a five:
  Exc ?- show( accounts_spec(2000,2005) ).
  {# In[2000:2005], Out[2000:2005], Profits[2000:2005] | Profits[2000] =
  In[2000]-Out[2000], Profits[2001] = In[2001]-Out[2001], Profits[2002] =
  In[2002]-Out[2002], Profits[2003] = In[2003]-Out[2003], Profits[2004] =
  In[2004]-Out[2004], Profits[2005] = In[2005]-Out[2005] #}.
  Result = void
Notice how the equations differ between the two results.

So we have a specification of some calculations to be performed over a range of years, adaptable to different start and end years. It isn't yet a spreadsheet: we now need to make it one.

We do this by stipulating how the tables are to be laid out on a worksheet, via the "mapping" operator:

  Exc ?- let ss be accounts_spec( 2000, 2001 )
                   mapping [ In[2000:2001] to Sheet1!A2:A3
                           , Out[2000:2001] to Sheet1!B2:B3
                           , Profits[2000:2001] to Sheet1!C2:C3
                           ].
  Result = <SPREADSHEET>
  Exc ?- show(ss).
  {# Sheet1!C2:C3 | Sheet1!C2 = Sheet1!A2-Sheet1!B2, Sheet1!C3 =
  Sheet1!A3-Sheet1!B3 #}.
  Result = void
That is a spreadsheet, and I can save it to file:
  Exc ?- save(  "accts.xml", ss ).
or invoke Excel directly:
  Exc ?- excel(  ss ).

Adaptability now rules. I could have used a different span of years, or a different allocation of tables to cells, or both:

  Exc ?- let ss2 be accounts_spec( 1987, 1992 )
                    mapping [ In[1987:1992] to Sheet1!A2:F2
                            , Out[1987:1992] to Sheet1!A3:F3
                            , Profits[1987:1992] to Sheet1!A4:F4
                            ].
  Result = <SPREADSHEET>
  Exc ?- show( ss2 ).
  {# Sheet1!A4:F4 | Sheet1!A4 = Sheet1!A2-Sheet1!A3, Sheet1!B4 =
  Sheet1!B2-Sheet1!B3, Sheet1!C4 = Sheet1!C2-Sheet1!C3, Sheet1!D4 =
  Sheet1!D2-Sheet1!D3, Sheet1!E4 = Sheet1!E2-Sheet1!E3, Sheet1!F4 =
  Sheet1!F2-Sheet1!F3 #}.
Notice how the ranges' size and layout differ from the previous translation.

I think I need to make the resulting spreadsheet more comprehensible to its end-users, so I'd better add some labels. Their positions depend on the cells to which they refer, so have to be specified in terms of worksheet locations, not as logical tables like In and Out.

We're trading in a multilingual world, so I should also make the labels parameterisable by language. Here's a function thar returns a row of labels running along the top of a spreadsheet, parameterisable to return English or Dutch:

  let captions( Lang ) be
    ( Lang = "en" ->
        {# Sheet1!A1:A3
         | Sheet1!A1 = "Income"
         , Sheet1!B1 = "Outgoings"
         , Sheet1!C1 = "Profits"
        #}
    ; Lang = "nl" ->
        {# Sheet1!A1:A3
         | Sheet1!A1 = "Verkopen"
         , Sheet1!B1 = "Uitgaven"
         , Sheet1!C1 = "Voordeel"
        #}
    ).
The ( ... -> ... ; ... ) construct here is like an Excel IF. In fact, I should probably implement Excel IF syntax as an alternative, but it is harder to read when conditionals are nested. The function just returns one of two one-row spreadsheets depending on whether the parameter Lang is "en" or "nl". Incidentally, strings are Unicode, so work with non-Latin characters such as Cyrillic and Japanese: I tried with kanji, but they don't display properly in this email.

How do we use this function? Well, calling it on its own will return the labels on their own:

  Exc ?- show( captions("en") ).
  {# Sheet1!A1:A3 | Sheet1!A1 = "Income", Sheet1!B1 =
  "Outgoings", Sheet1!C1 = "Profits" #}.
  Result = void
and
  Exc ?- show( captions("nl") ).
  {# Sheet1!A1:A3 | Sheet1!A1 = "Verkopen", Sheet1!B1 =
  "Uitgaven", Sheet1!C1 = "Voordeel" #}.
  Result = void

However, on their own they're no use: we want to add them to the accounts spreadsheet. We do this with the union operator \/ :

  Exc ?- let ss be accounts_spec( 2000, 2001 )
                   mapping [ In[2000:2001] to Sheet1!A2:A3
                           , Out[2000:2001] to Sheet1!B2:B3
                           , Profits[2000:2001] to Sheet1!C2:C3
                           ].
  Result = <SPREADSHEET>
  Exc ?- let ss_labelled be ss \/ captions("en").
  Result = <SPREADSHEET>
  Exc ?- show(ss_labelled).
  {# Sheet1!A1:C3 | Sheet1!A1 = "Income", Sheet1!B1 = "Outgoings",
  Sheet1!C1 = "Profits", Sheet1!C2 = Sheet1!A2-Sheet1!B2,
  Sheet1!C3 = Sheet1!A3-Sheet1!B3 #}.

That example was both interactive and long-winded, because I was stopping to explain each language feature as I introduced it. The same code, though, can be stored in a script file and run in one go. Excelsior is happy with any number of files, so you can build up libraries of specification modules and put them together in a separate script. You can also generate specification modules by reverse-engineering existing spreadsheets.

That then is my current approach to modularisation. I'm not expecting spreadsheeters with experience of writing only for Excel to use this interface: it's aimed at IT admin types who need to manage a department's spreadsheets and who are used to languages such as Perl. However, Excel-only users who are willing to learn VBA so they can define their own functions (as recommended, for example, in Phil's paper) oughtn't to find Excelsior any harder to learn. In fact, since it eschews such complexities as pointers and call-by-reference (with their opportunities for dangling references and unintended aliasing), and the Excel object model, it ought to be simpler.

[ Jocelyn Ireson-Paine's Home Page | Publications ]