[ Jocelyn Ireson-Paine's Home Page | Tool demonstrations for this paper | Contact ]

Module Expressions for Modularising Spreadsheets and Sharing Code between Them

Jocelyn Ireson-Paine

Abstract

Microsoft Excel has no way to share code between different spreadsheets. One must copy formulae from one spreadsheet into another, creating multiple copies of one's code with all the problems of maintenance and updating that this causes. I show that by regarding spreadsheets as theories, and module composition as theory sum and theory renaming, one can share code. I have implemented this in a spreadsheet-description language named Excelsior, and in a service that enables developers to copy modules into existing spreadsheets.
Spreadsheets, Excel, spreadsheet safety, modularisation, algebraic specification, equational programming

1  Introduction

Spreadsheets are dangerous - they are so easy to misprogram. Because they are often used by financial organisations, much danger is financial: such as miscalculated profits and taxes. For example, there was a notorious case in September 2008 just after Lehman went bust. Barclays decided to buy some of Lehman's contracts. A Lehman's exec listed these in a spreadsheet which he sent to Barclays' law firm. But the spreadsheet had 179 hidden rows holding contracts that the exec didn't intend be seen. The law firm didn't realise this, so Barclays ended up buying them [7]. Other examples of the dangers can be found on the Horror Stories page of the European Spreadsheet Risks Interest Group's Web site [3].
One defect in Excel, and the one I attack here, is that you can't share code between different spreadsheets or parts of a spreadsheet. If you have written formulae that do some calculation, and you want to use the calculation elsewhere, you must copy and paste these formulae. The copied formulae probably won't work, because their input cells won't be the same as in the originals. But even if they were, you'd need to correct all the copies each time you fix a bug, and that would make maintenance a nightmare. Excel just has no way to write modules that can be independently coded, tested, and documented.
To overcome this, I have implemented a modularisable spreadsheet-description language, Excelsior, that compiles to Excel. Excelsior programs live in text files, and you can edit them in the same way as programs in - for example - C++ and Java. They can use meaningful identifiers rather than A1-style cell addresses, so are easier to read than Excel. These identifiers name rectangular arrays; the Excelsior compiler generates a spreadsheet by mapping these arrays to regions on Excel worksheets. The relevance to Calco and algebraic specification is that Excelsior programs and modules, and spreadsheets, can be seen as equational theories. Module inclusion becomes theory sum, possibly combined with a renaming morphism. I have implemented this in two different systems. One is the Excelsior compiler, downloadable free via [13]. The other is a service from which users can copy modules into already-written spreadsheets [14].
The rest of the paper explains how spreadsheets and Excelsior programs can be seen as theories, and how this implements module composition. I end with a demo of how several copies of a module can be included in a spreadsheet. Readers and reviewers can run my demonstrations by downloading from the Web page at [13]. There are fuller descriptions of Excelsior in blog postings which show how to code the game of Life [11] and how to simulate loans [12], and in a EuSpRIG paper showing how to generate science-fiction plots and search cell ranges for text [9].

1.1  Relation to Other Work

Module inclusion as theory sum is an idea I first met in the algebraic-specification language OBJ3 [6], but I believe it goes back to OBJ3's ancestor Clear [1,2]. Both are instances of the discipline known as parameterised programming [5], which uses module expressions to combine modules into systems. So is Excelsior. All versions of Excelsior have had modules. I aimed my previous papers, however, at spreadsheet developers rather than algebraic programmers: this paper is the first that explicitly relates the two. As far as I know, mine is the first application of algebraic programming to spreadsheets, and the first module system for spreadsheets that is built on solid mathematical foundations.
My first version of Excelsior was inspired by Goguen's sheaf semantics for concurrent interacting objects [4]. I saw spreadsheets as simulating physical objects [8]. Objects had attributes which changed over time, their values being spread out in successive rows or columns. This reflected the sheaf-semantics formulation of an object: a function from a "base set" such as time, to a set of products of attribute values. (To be more precise, an object is a set of functions related by the "sheaf condition". There isn't room to discuss that here: Goguen's paper gives details.) The Excelsior arrays that I describe below still correspond to these functions. However, spreadsheets don't only simulate physical objects, and although my sheaves-inspired language was nice for doing so, it seemed awkard for some other spreadsheeting tasks. So I didn't base later versions on objects.

2  Excelsior, Spreadsheets as Theories, and Modularisation

2.1  Spreadsheets Are Theories

If you list the formulae in a spreadsheet, it becomes clear that it can be regarded as an equational theory.
In David Copperfield, Micawber remarks that "Annual income twenty pounds, annual expenditure nineteen pounds nineteen and six, result happiness. Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery". So consider a one-row cash-flow spreadsheet that calculates how much money Micawber has, given his income and expenses. If cell A1 is 20, A2 is 20.025, and A3 holds the formula =A1-A2, then we have a theory whose equations are
A1 = 20 
A2 = 20.025 
A3 = A1 - A2 

The theory's sort structure is strange, because Excel has dynamic typing. Cells can hold numbers, text, Booleans, error indicators, and dates, amongst other things: there are abstruse rules for converting between them, determining when a cell is empty, and deciding how the built-in functions interpret them. It is probably best to regard all cells as being the same sort, the union of Excel's primitive types. Excel also allows arrays in cells, but I ignore that here.

2.2  Excelsior Programs Are Theories

Excelsior aims to make spreadsheets easier to read and code. Modularisation is one way; meaningful identifiers are another. The equations below are obviously more readable than those above:
income = 20 
expenses = 20.025
whats_left = income - expenses 

This, in a nutshell, is Excelsior. It reads equations together with information about sorts, and rewrites them to Excel formulae. And if a spreadsheet is a theory, so is a set of equations such as these.

2.3  Excelsior Syntax, Arrays, and Arrays as Functions

Many spreadsheets model money over a series of time points. For example, were Micawber to calculate net income over several years, he could extend my one-row spreadsheet by having a row for each year, displaying the year in column A, and shifting income, expenses and net income to columns B, C, and D. There would be a sequence of formulae =B1-C1, =B2-C2, and so on, in column D. I could code this in Excelsior as follows:
type years = 2011:2019.
// Micawber was alive in the 1800's, but Excel's
// DATE function won't generate dates before 1900.
table year      : years -> date.
table income    : years -> currency.
table expenses  : years -> currency.
table whats_left: years -> currency.
year[ y ] = DATE( y, 12, 31 ).
whats_left[ y ] = income[ y ] - expenses[ y ].
layout( 'Sheet 1'
      , rows( row( year, income, expenses, whats_left ) )
      ).

The table statements declare arrays with the specified bounds, and the equations quantify over these, setting each element of their left-hand side. The layout statement describes the target spreadsheet as a grid of arrays, arranging the arrays next to one another in the order specified. By default, one-dimensional arrays run vertically, so the spreadsheet will have year in column A, income in column B, and so on.
This way of describing layout separates appearance from behaviour. For example, to stack the arrays horizontally, I would allocate one array to each row, and use the as x annotation which makes them run along the x axis:
layout( 'Sheet 1', rows( row( year as x )
                       , row( income as x )
                       , row( expenses as x )
                       , row( whats_left as x )
                       )
      ).

If Excelsior programs are theories, arrays are functions. The array identifiers are the operation symbols in the theory's signature. Each array is a function from its set of indices to the Excel cell-value type. I should note that the current version of Excelsior does force programmers to declare array elements as one of date, currency, boolean and so on. This makes programs more self-documenting, and enables Excelsior to tell Excel how to format the array elements. Excelsior does not (yet) typecheck equations against these declarations.
Spreadsheets are theories too, so compiling an Excelsior program to a spreadsheet is a morphism from the program theory to the spreadsheet theory. By the standards of algebraic programming, it is strange because it applies coordinate transformations.
A worksheet cell could be seen either as a singleton array, or as one element of an array that spans the entire worksheet. I do the latter. The compilation morphism then becomes even stranger, since it merges all arrays (i.e. functions) that live on the same worksheet into one.

2.4  Recursion over Arrays

Arrays as functions lead to novel ways to program Excel, and increase the variety of modules one can code. For example, [9] describes a spreadsheet, downloadable from [15], that generates science-fiction plots by recursively walking a transition network that represents which plot events can follow one another. This may seem impossible, as Excel doesn't let you define your own functions. But by regarding arrays as functions, and writing recursive equations, one does get recursion. It's spread out in space rather than time; but it is definitely recursion.

2.5  Module Inclusion Is Theory Sum

Excelsior text files can be used as top-level programs, or as modules: both are theories. The include statement, which takes a filename as argument, includes a module into a program. Excelsior implements this by summing the theory contained in the file and the theory importing it.
Instead of include, I could have provided an explicit theory-sum operator. However, most common programming languages have some kind of module inclusion, and I felt that this would be more familiar to users.

2.6  Renaming Modules, and Including Several Copies

I have, however, adopted another module operation from the OBJ family: renaming. This combines theory sum with a renaming morphism. For example, the program below imports two copies of a module that splits input strings at a delimiter. The renaming construct parameterises the module, mapping types and arrays in it to types and arrays in the enclosing program.(It also maps constants, which there isn't room to explain.) The program instantiates the module twice, so there are two instances of each of its arrays: these get mapped to a different location on the worksheet. Here is the module:
/* This module splits a string at a delimiter. It defines:
     sep  : the delimiter, a constant.
     base : the base type of the arrays.
     input: the strings to be split.
     pos  : working storage, position of the delimiter.
     pre  : result, the part before the delimiter.
*/
constant sep.
type base.
table input  : base -> text.
table pos    : base -> general.
table pre    : base -> text.
pos[ i ] = FIND( sep, input[i], 1 ).
pre[ i ] = LEFT( input[i], pos[i]-1 ).

And here is the enclosing program:
constant delim1 = ",".
type range1 = 1:2.
table data1 : range1 -> text.
data1[1] = "Dijkstra,E.W.".
data1[2] = "Hoare,C.A.R.".
table delpos1: range1 -> general.
table before1: range1 -> text.
include "c:\excelsior\split.exc"
        renaming sep to delim1
                 base to range1
                 input to data1
                 pos to delpos1
                 pre to before1.
constant delim2 = " ".
type range2 = 1:1.
table data2 : range2 -> text.
data2[1] = "Algol 60". 
table delpos2: range2 -> general.
table before2: range2 -> text.
include "c:\excelsior\split.exc"
        renaming sep to delim2
                 base to range2
                 input to data2
                 pos to delpos2
                 pre to before2.
layout( 'Sheet 1', rows( row( data1, delpos1, before1 )
                       , row( data2, delpos2, before2 )
                       )
      ).

I have used this in my freelance programming. Thus, [9] shows a module that searches for cells matching a specified pattern and copies them to an output range. A spreadsheet for modelling social-housing finances that I worked on needed to apply this to five cell ranges, to make five lists of options for dropdown menus. To replicate these calculations five times in raw Excel would have been tedious, and a pain when amending them. But by coding them, and the rest of the spreadsheet, in Excelsior, I needed only to write five include statements.

2.7  Copying Modules into Existing Spreadsheets

Consider an existing spreadsheet with a "hole" we want to put a module into. The parameters to a module affect the shape, size, and position of its arrays. So we could write a "stub" program that includes the module but does nothing else, then compile it to a spreadsheet holding only the module. By adjusting these parameters, we can shape it to fit any hole; then copy it to that hole. This, albeit with more automation, is how my Spreadsheet Parts Repository [14] works. There isn't room for an example here, but I have a demo on my Calco-Tools demonstration page [13].

References

[1]
Burstall, R., Goguen, J.A: Putting theories together to make specifications. In: Reddy, R. (ed.) Proc. V IJCAI. Cambridge Mass., pp. 1045-1058 (1977), http://ijcai.org/Past%20Proceedings/IJCAI-77-VOL2/PDF/095.pdf
[2]
Burstall, R., Goguen, J.A: The semantics of Clear, a specification language. In: Bjorner, D. (ed.) Proc. Copenhagen Winter School on Abstract Software Specification. LNCS, vol. 86, pp. 292-332. Springer, Heidelberg (1980)
[3]
EuSpRIG Horror Stories. EuSpRIG Web page, http://www.eusprig.org/horror-stories.htm
[4]
Goguen, J.A: Sheaf Semantics for Concurrent Interacting Objects. Mathematical Structures in Computer Science, 2, 159-191 (1992), http://www.cs.ucsd.edu/~goguen/ps/sheaf.ps.gz, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.4296
[5]
Goguen, J.A, Tracz, W.: An Implementation-Oriented Semantics for Module Composition. In: Leavens, G., Sitaraman, M. (eds.) Foundations of Component-based Systems, pp. 231-263. Cambridge University Press (1997), http://cseweb.ucsd.edu/~goguen/ps/will.ps.gz, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.3793
[6]
Goguen, J.A., Winkler, T., Meseguer, J., Futatsugi, K., Jouannaud, J-P.: Introducing OBJ (1993), http://cseweb.ucsd.edu/~goguen/pps/iobj.ps, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.9983
[7]
Hayes, F.: Lehman-Barclays slip shows perils of data transfer. Computerworld (27 October 2008), http://computerworld.co.nz/news.nsf/mgmt/6F029D457D594675CC2574EC00010F62
[8]
Ireson-Paine, J.N.: Ensuring Spreadsheet Integrity with Model Master. In: Proceedings of EuSpRIG 2001, http://www.j-paine.org/eusprig2001_as_html/eusprig2001.html
[9]
Ireson-Paine, J.N.: Spreadsheet Components For All. In: Proceedings of EuSpRIG 2008, http://www.j-paine.org/eusprig2008.html
[10]
Ireson-Paine, J.N.: Spreadsheet Components, Google Spreadsheets, and Code Reuse. Copy of Dr. Dobbs blog posting (March 2009), http://www.j-paine.org/dobbs/spreadlets.html
[11]
Ireson-Paine, J.N.: Gliders, Hasslers, and the Toadsucker: Writing and Explaining a Structured Excel Life Game. Copy of Dr. Dobbs blog posting (March 2009), http://www.j-paine.org/dobbs/life.html
[12]
Ireson-Paine, J.N.: How to Document a Spreadsheet: an Exercise with Cash-Flow and Loans. Copy of Dr. Dobbs blog posting (March 2009), http://www.j-paine.org/dobbs/loans.html
[13]
Ireson-Paine, J.N.: Demonstrations for Calco-Tools 2011 (March 2011), http://www.j-paine.org/calco2011/demos.html
[14]
Ireson-Paine, J.N.: Ireson-Paine Spreadsheet Parts Repository home page, http://www.spreadsheet-parts.org/
[15]
Ireson-Paine, J.N.: Excel science-fiction generator. Ireson-Paine Spreadsheet Parts Repository Web page, http://www.spreadsheet-parts.org/sf.html



File translated from TEX by TTHgold, version 4.00.
On 30 Mar 2011, 13:14.