[ Jocelyn Ireson-Paine's Home Page | Publications | More about Excelsior ]

Excelsior: bringing the benefits of modularisation to Excel

Jocelyn Ireson-Paine

www.j-paine.org/
www.spreadsheet-factory.com/

 

"There are two ways of constructing a software design. One is to make it so simple that there are obviously no deficiencies; the other is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
C.A.R. Hoare
The Emperor's Old Clothes
Communications of the ACM, February 1981

 

I'm going to talk about my work on modularising spreadsheets. I believe this will confer the same benefits on spreadsheeting as modularisation does on other programming languages: simpler design, coding, testing, debugging and documentation; less chance of mistakes; saving work, time, and money.

Here's a little story to show why spreadsheets need modularity. Imagine you work in economic forecasting - in a company like Oxford Economic Forecasting, say, or the Institute for Fiscal Studies. You have just completed a model of the Belgian economy which forecasts the growth of the construction industry over the next ten years. It's a spreadsheet, naturally. Your boss is pleased, and asks you to do a similar model for Holland. No problem. You open a new spreadsheet, copy the guts of your model into it (they're both Eurozone countries, so there's not much to change), and tweak some economic parameters. It works; and a little later, you're asked for a German model. And an Austrian. And ...

Then, your boss needs a longer-term forecast. "We must have it 20 years ahead, not 10." You have to go through each model inserting rows. Then the Belgian office says, "we need the models with the labels in French. And Flemish; they speak that here, too."

You go for a much-needed holiday. While you're on the beach, your assistant receives a call. Your company's main investor has heard a rumour that your models grossly overestimate growth - their assumptions are faulty. He demands immediate validation of all spreadsheets. Your assistant, who wasn't present when you built the models, doesn't know that they're largely all copies of one another. Here he is: Cartoon: spreadsheet user fruitlessly 
hacks through error-ridden non-modular spreadsheet

Here's how I want him to be: Cartoon: spreadsheet user smugly 
contemplates error-free modular spreadsheet

Generally speaking in engineering, modules are parts or subassemblies. A module makes sense on its own; it can be designed, built, tested, debugged and documented on its own. But it can also be combined into a larger assembly. To do so, you don't need to know how it works: just where to plug it in and what its outputs are as a function of its inputs. It's a black box whose external interface is entirely distinct from its internal workings. And if you need to try and understand something that already contains a module, you don't waste time delving into those internal workings; you read the spec.

The notion applies to software engineering in many ways. Functions are modules. So are procedures and coroutines. Abstract data types are, and classes in object-oriented languages. And some languages actually have constructs called modules.

This is familiar stuff to computer scientists. I'm labouring it because it may not be to spreadsheeters from other backgrounds. Spreadsheeters didn't realise (thanks, EuSpRIG) how risky spreadsheeting is; and they don't realise how necessary modularisation is. Perhaps I should exert proof by authority. Here's a selection of quotes by assorted computer scientists and software engineers:

"The Rule of Modularity bears amplification here: The only way to write complex software that won't fall on its face is to build it out of simple modules connected by well-defined interfaces, so that most problems are local and you can have some hope of fixing or optimizing a part without breaking the whole."
Eric Raymond
"Early Unix programmers became good at modularity because they had to be. An OS is one of the most complicated pieces of code around. If it is not well structured, it will fall apart. There were a couple of early failures at building Unix that were scrapped. One can blame the early (structureless) C for this, but basically it was because the OS was too complicated to write. We needed both refinements in tools (like C structures) and good practice in using them (like Rob Pike's rules for programming) before we could tame that complexity."
Ken Thompson
"If you want to avoid having a missed bug affect an arbitrarily large proportion of the system, the answer is compartmentalization (more modularity), with effective enforcement of the boundaries between compartments. qmail's strategy of having several different processes is an example of this. Orange Book mandatory access controls are another. These mechanisms are actually able to prevent a missed bug from affecting an arbitrarily large proportion of the system."
Kragen Sitaker
"You can model the value of modularity in a decision-theoretic way by using the Black-Scholes equation. Roughly, modularity increases your set of options for modifying a system, and it's a basic result of options theory that having a portfolio of individual options is more valuable than having an option on a portfolio. (Intuitively, a program whose components you can selectively upgrade is more valuable than one which is a monolith which must be upgraded in a big bang.) But to use that result, it must actually be the case that your modules are modular - and that's where you need abstraction as a semantic property."
Neel Krishnaswami
"The reliable design of a highly sophisticated program is anyway a difficult task and we could place our question in a much broader perspective and ask ourselves 'How does the human mind invent something very intricate, how does the human mind think difficult thoughts?' Fascinating as those general questions are, I shall not touch upon them. I shall restrict myself to the more limited field of programming, because already in the programming field we can distinguish five elements of mental discipline, each of which is a great help in keeping our programs understandable. In order to be able to talk about them, I must name them, I have called them sequencing discipline, operational abstraction, representational abstraction, configurational abstraction and textual encapsulation. In order to avoid false hopes: all these elements are known, all higher level programming languages cater more or less successfully for some of them."
Edsger Dijkstra

So isn't it a shame about Excel?

"From a programming language point of view, then, spreadsheets lack the most fundamental mechanism that we use to control complexity: the ability to define re-usable abstractions. In effect, they deny to end-user programmers the most powerful weapon in our armory. Can you imagine programming in C without procedures, however clever the editor’s copy-and-paste technology?"
Simon Peyton Jones, Margaret Burnett, and Alan Blackwell
Improving the world's most popular functional language: user-defined functions in Excel

I add modularity by saying that modules are just spreadsheets. You can write them completely as usual, using Excel. Hopefully, you will test and document them. Using my software, you can then glue them together into a larger spreadsheet.

The software provides operations on spreadsheets. This includes putting them together, but also cutting pieces out of them (because you may not want to put all of two spreadsheets together), shifting them into the right position, and so on. These are the operators:

As I said, these work on spreadsheets. You are treating these spreadsheets as modules which can be designed, tested, debugged, and documented each on their own before being combined into a larger system, but they are still just spreadsheets. (Actually, the "module-valued functions" are something different, a kind of generalised spreadsheet whose shape and contents can be adjusted according to its parameters. But that's an advanced topic.)

I have implemented these operations in my programming language, Excelsior, which calculates with spreadsheets in the same sense that other languages calculate with numbers, vectors, matrices, and strings. Excelsior is a "desk calculator" or "computer algebra system" for spreadsheets.

Please note that Excelsior is not for users who know only Excel. I still have to design an interface for them. Excelsior will form its foundations.

Please note also that Excelsior contains a lot of other goodies beside the module operations. For example, you can examine and change the contents of any cell of any spreadsheet; you can compare cells against their styles; you can check for cells with undefined precedents; you can print all the formulae in a spreadsheet; there are "structure discovery" tools for making spreadsheets more intelligible; and so on. I designed these so that programmers familiar with "scripting languages" such as Perl or Python could use Excelsior as a scripting language for spreadsheets.

Now let me show some examples of Excelsior. Like computer-algebra languages such as Matlab, Maple and Macsyma, it's an interactive language. You type a command; Excelsior evaluates it and displays a result. Usually in these examples, the result is a spreadsheet; either one I've loaded from file, or one generated on the spot with the module operations. Excelsior holds these new spreadsheets as data structures in memory. I can ask it to display their formulae with the show function, or to open them in Excel with the excel function. The examples do both. Where I called Excel, I show the result by saving the spreadsheet as HTML and pasting it in. So let's go:


?- 1+2. 
3
[ Like any interactive algebra program, Excelsior can
calculate with numbers. ]

?- 3*4. 
12

?- let x be 1.
?- let y be 2.
?- x+y. 
3
[ Excelsior can define variables to hold results you may want to use
again. ]

?- let x be load("demo.xml").
[ The above assignments put numbers in x and y. This one
puts an entire spreadsheet into x. The spreadsheet came from the
Excel saved file demo.xml. Excelsior holds it as a complicated
data structure, much as it might hold a mathematical
object like a vector or a matrix. ]

?- show(x).
spreadsheet( Sheet1!A2:D4
      ) 
      Sheet1!A2 = 2000
      Sheet1!A3 = 2001
      Sheet1!A4 = 2002
      Sheet1!B2 = 1492
      Sheet1!B3 = 1560
      Sheet1!B4 = 1796
      Sheet1!C2 = 971
      Sheet1!C3 = 1803
      Sheet1!C4 = 2701
      Sheet1!D2 = Sheet1!C2-Sheet1!B2
      Sheet1!D3 = Sheet1!C3-Sheet1!B3
      Sheet1!D4 = Sheet1!C4-Sheet1!B4.
[ Show the formulae in the spreadsheet. ]

?- excel(x).
[ Opens the spreadsheet in Excel as below: ]
2000 1492 971 -521
2001 1560 1803 243
2002 1796 2701 905
?- let labels be load("labels.xml"). [ Loads another spreadsheet and puts it into the variable called 'labels'. ] ?- show(labels). spreadsheet( Sheet1!A1:D1 ) Sheet1!A1 = "Year" Sheet1!B1 = "Expenses" Sheet1!C1 = "Sales" Sheet1!D1 = "Profit". ?- let z be x \/ labels. [ Adds x and y to make z. ] ?- show(z). spreadsheet( Sheet1!A1:D4 ) Sheet1!A1 = "Year" Sheet1!A2 = 2000 Sheet1!A3 = 2001 Sheet1!A4 = 2002 Sheet1!B1 = "Expenses" Sheet1!B2 = 1492 Sheet1!B3 = 1560 Sheet1!B4 = 1796 Sheet1!C1 = "Sales" Sheet1!C2 = 971 Sheet1!C3 = 1803 Sheet1!C4 = 2701 Sheet1!D1 = "Profit" Sheet1!D2 = Sheet1!C2-Sheet1!B2 Sheet1!D3 = Sheet1!C3-Sheet1!B3 Sheet1!D4 = Sheet1!C4-Sheet1!B4. ?- excel(z). [ Opens the spreadsheet in Excel, showing the labels have been added: ]
Year Expenses Sales Profit
2000 1492 971 -521
2001 1560 1803 243
2002 1796 2701 905
?- let labels_nl be load("labels_nl.xml"). [ Now I'll load a different set of labels, in Dutch. ] ?- show( labels_nl ). spreadsheet( Sheet1!A1:D1 ) Sheet1!A1 = "Jaar" Sheet1!B1 = "Uitgaven" Sheet1!C1 = "Verkopen" Sheet1!D1 = "Voordeel". ?- let z be x \/ labels_nl. [ I've added these to the original spreadsheet. See how easy it was to generate first an English-labelled version, and then a Dutch. ] ?- excel(z).
Jaar Uitgaven Verkopen Voordeel
2000 1492 971 -521
2001 1560 1803 243
2002 1796 2701 905

Now please be patient, because this example is more complicated. Excelsior can also be used to transformed to show how a spreadsheet's author intends to group its cells.


?- let x be load("demo.xml").
[ I load the same little accounts
spreadsheet as above. ]

?- show(x).
spreadsheet( Sheet1!A2:D4
      ) 
      Sheet1!A2 = 2000
      Sheet1!A3 = 2001
      Sheet1!A4 = 2002
      Sheet1!B2 = 1492
      Sheet1!B3 = 1560
      Sheet1!B4 = 1796
      Sheet1!C2 = 971
      Sheet1!C3 = 1803
      Sheet1!C4 = 2701
      Sheet1!D2 = Sheet1!C2-Sheet1!B2
      Sheet1!D3 = Sheet1!C3-Sheet1!B3
      Sheet1!D4 = Sheet1!C4-Sheet1!B4.

?- let y be x mapping Sheet1!A2:A4 to Year[2000:2002].
[ This is new. I tell Excelsior that
I want to transform this spreadsheet so that cell A2 is
renamed "Year[2000]", cell A3 is renamed "Year[2001]", 
and cell A4 is renamed "Year[2002]". ]

?- show(y).
spreadsheet( Sheet1!B2:D4
           , Year[2000:2002]
      ) 
      Sheet1!B2 = 1492
      Sheet1!B3 = 1560
      Sheet1!B4 = 1796
      Sheet1!C2 = 971
      Sheet1!C3 = 1803
      Sheet1!C4 = 2701
      Sheet1!D2 = Sheet1!C2-Sheet1!B2
      Sheet1!D3 = Sheet1!C3-Sheet1!B3
      Sheet1!D4 = Sheet1!C4-Sheet1!B4
      Year[2000] = 2000
      Year[2001] = 2001
      Year[2002] = 2002.
[ Those three cells have been renamed. ]

?- let y be x mapping [ Sheet1!A2:A4 to Year[2000:2002], 
                        Sheet1!B2:B4 to Expenses[2000:2002], 
                        Sheet1!C2:C4 to Sales[2000:2002], 
                        Sheet1!D2:D4 to Profit[2000:2002] 
                      ].
[ You can do several renamings at once. ]

?- show(y).
spreadsheet( Expenses[2000:2002]
           , Profit[2000:2002]
           , Sales[2000:2002]
           , Year[2000:2002]
      ) 
      Expenses[2000] = 1492
      Expenses[2001] = 1560
      Expenses[2002] = 1796
      Profit[2000] = Sales[2000]-Expenses[2000]
      Profit[2001] = Sales[2001]-Expenses[2001]
      Profit[2002] = Sales[2002]-Expenses[2002]
      Sales[2000] = 971
      Sales[2001] = 1803
      Sales[2002] = 2701
      Year[2000] = 2000
      Year[2001] = 2001
      Year[2002] = 2002.
[ See how much easier to read this is than the
original? ]

So Excelsior can read spreadsheets from files and build them from other spreadsheets. You can also write them as intelligible specifications like the one above, and then reverse the transformation to convert into a spreadsheet. This lets us use these specifications as the master version of our modules. It also lets us separate layout from calculation, so we can adapt a spreadsheet to different layouts as needed.


?- let x be {# Expenses[2000:2001]
             , Profit[2000:2001]
             , Sales[2000:2001]
             , Year[2000:2001]
             | Expenses[2000] = 1492
             , Expenses[2001] = 1560
             , Profit[all Y] = Sales[Y]-Expenses[Y]
             , Sales[2000] = 971
             , Sales[2001] = 1803
             , Year[2000] = 2000
             , Year[2001] = 2001
             #}.
[ I have put an intelligible specification for
an accounts spreadsheet into x. Excelsior holds this using
the same kind of data structure it does for spreadsheets.
I can transform it into a spreadsheet by reversing the
mapping I did earlier. I am going to do so in TWO ways, one in
the left-hand column of this table and one in the
right-hand column. Please compare them. ]

?- let y be x mapping [ Year[2000:2001] to Sheet1!A2:A3,
                        Expenses[2000:2001] to Sheet1!B2:B3, 
                        Sales[2000:2001] to Sheet1!C2:C3,
                        Profit[2000:2001] to Sheet1!D2:D3
                      ].

?- show(y).
spreadsheet( Sheet1!A2:D3
      ) 
      Sheet1!A2 = 2000
      Sheet1!A3 = 2001
      Sheet1!B2 = 1492
      Sheet1!B3 = 1560
      Sheet1!C2 = 971
      Sheet1!C3 = 1803
      Sheet1!D2 = Sheet1!C2-Sheet1!B2
      Sheet1!D3 = Sheet1!C3-Sheet1!B3.

?- excel(y).
2000 1492 971 -521
2001 1560 1803 243

?- let z be x mapping [ Year[2000:2001] to Sheet1!A2:B2,
                        Expenses[2000:2001] to Sheet1!A4:B4, 
                        Sales[2000:2001] to Sheet1!A6:B6,
                        Profit[2000:2001] to Sheet1!A8:B8
                      ].

?- show(z).
spreadsheet( Sheet1!A2:B8
      ) 
      Sheet1!A2 = 2000
      Sheet1!A4 = 1492
      Sheet1!A6 = 971
      Sheet1!A8 = Sheet1!A6-Sheet1!A4
      Sheet1!B2 = 2001
      Sheet1!B4 = 1560
      Sheet1!B6 = 1803
      Sheet1!B8 = Sheet1!B6-Sheet1!B4.

?- excel(z).
2000 2001
1492 1560
971 1803
-521 243

And Bingo!, layout-calculation independence. For more info on Excelsior, please see my EuSpRIG 2005 paper on Excelsior. I also have an early Excelsior specification. This is for the Prolog interface to Excelsior, and it may look pretty complicated. Don't worry about that: just look at the contents lists and examples to see what features there are.

[ Jocelyn Ireson-Paine's Home Page | Publications | More about Excelsior ]