[ Jocelyn Ireson-Paine's Home Page
| Tool demonstrations for this paper
| Contact
]
Module Expressions for Modularising Spreadsheets
and Sharing Code between Them
Jocelyn Ireson-Paine
Abstract
Microsoft Excel has no way to share code between
different spreadsheets. One must copy formulae from one spreadsheet
into another, creating multiple copies of one's code with all the
problems of maintenance and updating that this causes. I show that by
regarding
spreadsheets as theories, and module composition as theory sum and theory
renaming, one can share code. I have implemented this
in a spreadsheet-description language named Excelsior,
and in a service that enables developers to copy modules into
existing spreadsheets.
Spreadsheets, Excel, spreadsheet safety, modularisation, algebraic
specification, equational programming
1 Introduction
Spreadsheets are dangerous - they are so easy to misprogram.
Because they are often used by financial
organisations, much danger is
financial: such as miscalculated profits and taxes.
For example, there was a notorious case in September 2008
just after Lehman went bust. Barclays decided to
buy some of Lehman's contracts. A Lehman's exec listed
these in a spreadsheet which he sent to Barclays' law
firm. But the spreadsheet had 179 hidden rows holding
contracts that the exec didn't intend be seen. The
law firm didn't realise this, so Barclays ended up
buying them [7]. Other examples of
the dangers can be found on the
Horror Stories page of the European Spreadsheet Risks
Interest Group's Web site [3].
One defect in Excel, and the one I attack here,
is that you can't share code between different
spreadsheets or parts of a spreadsheet.
If you have written formulae that do some calculation,
and you want to use the calculation elsewhere,
you must copy and paste these formulae.
The copied formulae probably won't work, because
their input cells won't be the same as in
the originals. But even if they were, you'd need to
correct all the copies each time you fix a bug, and that would make
maintenance a nightmare. Excel just has no way
to write modules that can be independently coded, tested,
and documented.
To overcome this, I have implemented a modularisable
spreadsheet-description
language, Excelsior, that compiles to Excel. Excelsior programs live in
text files, and you
can edit them in the same way as programs in - for example -
C++ and Java. They can use meaningful
identifiers rather than A1-style cell addresses,
so are easier to read than Excel. These identifiers
name rectangular arrays; the Excelsior compiler
generates a spreadsheet by mapping
these arrays to regions on Excel worksheets. The relevance to Calco
and algebraic specification is that Excelsior programs and modules,
and spreadsheets, can be seen as equational theories. Module
inclusion becomes theory sum, possibly combined with
a renaming morphism. I have implemented this in two different
systems. One is the Excelsior compiler, downloadable
free via [13].
The other is a service from which users can
copy modules into already-written spreadsheets [14].
The rest of the paper explains how spreadsheets and Excelsior
programs can be seen as theories, and how this implements module
composition. I end with a demo of how several
copies of a module can be included in a spreadsheet. Readers and reviewers
can
run my demonstrations by downloading from
the Web page at [13]. There are fuller descriptions of
Excelsior
in blog postings which show how to code the game of Life [11]
and how to simulate loans [12], and
in a EuSpRIG paper
showing how to generate science-fiction plots and
search cell ranges for text [9].
1.1 Relation to Other Work
Module inclusion as theory sum is an idea I first met
in the algebraic-specification language OBJ3 [6],
but I believe it goes back to OBJ3's ancestor
Clear [1,2]. Both are instances of the
discipline known as parameterised programming [5], which
uses module expressions to combine modules into systems. So
is Excelsior. All versions of Excelsior have had modules. I aimed
my previous papers, however, at spreadsheet developers rather than
algebraic programmers: this paper is the first
that explicitly relates the two. As far as I know, mine is the first
application of algebraic programming
to spreadsheets, and the first module system for spreadsheets
that is built on solid mathematical foundations.
My first version of Excelsior was inspired by Goguen's
sheaf semantics for concurrent interacting objects [4]. I saw spreadsheets as simulating
physical objects [8]. Objects had
attributes which changed over time,
their values being spread out in successive rows or columns. This
reflected the sheaf-semantics formulation of an object:
a function from a "base set" such as time, to a set of
products of attribute values. (To be more precise, an object is a set of
functions related
by the "sheaf condition". There isn't room to discuss that here: Goguen's
paper
gives details.) The Excelsior arrays that I describe
below still correspond to these functions. However, spreadsheets
don't only simulate physical objects, and although my sheaves-inspired
language
was nice for doing so, it seemed awkard for some other spreadsheeting
tasks. So I didn't base later versions on objects.
2 Excelsior, Spreadsheets as Theories, and
Modularisation
2.1 Spreadsheets Are Theories
If you list the formulae in a spreadsheet, it becomes
clear that it can be regarded as an equational theory.
In David Copperfield, Micawber remarks that
"Annual income twenty pounds, annual expenditure nineteen pounds
nineteen and six, result happiness. Annual income twenty pounds,
annual expenditure twenty pounds ought and six, result misery".
So consider a one-row cash-flow spreadsheet that calculates how much money
Micawber
has, given his income and expenses. If cell A1 is 20,
A2 is 20.025, and A3 holds the formula
=A1-A2, then we have a theory whose equations are
A1 = 20
A2 = 20.025
A3 = A1 - A2
The theory's sort structure is strange,
because Excel has dynamic typing. Cells can hold
numbers, text, Booleans, error indicators, and dates, amongst
other things: there are abstruse rules
for converting between them, determining when a cell
is empty, and deciding how the
built-in functions interpret them. It is probably best to
regard all cells as being the same sort, the union of
Excel's primitive types. Excel also allows arrays in cells, but I
ignore that here.
2.2 Excelsior Programs Are Theories
Excelsior aims to make spreadsheets easier to
read and code. Modularisation is one way; meaningful
identifiers are another. The equations below are obviously more
readable than those above:
income = 20
expenses = 20.025
whats_left = income - expenses
This, in a nutshell, is Excelsior. It reads equations together
with information about sorts, and rewrites them to Excel formulae.
And if a spreadsheet is a theory, so is a set of equations such as these.
2.3 Excelsior Syntax, Arrays, and Arrays as Functions
Many spreadsheets model money over a series of time points.
For example, were Micawber to calculate net income over several
years, he could extend my
one-row spreadsheet by having a row for each year, displaying the year
in column A, and shifting income,
expenses and net income to columns B, C, and D.
There would be a sequence of
formulae =B1-C1, =B2-C2, and so on,
in column D. I could code this in Excelsior as follows:
type years = 2011:2019.
// Micawber was alive in the 1800's, but Excel's
// DATE function won't generate dates before 1900.
table year : years -> date.
table income : years -> currency.
table expenses : years -> currency.
table whats_left: years -> currency.
year[ y ] = DATE( y, 12, 31 ).
whats_left[ y ] = income[ y ] - expenses[ y ].
layout( 'Sheet 1'
, rows( row( year, income, expenses, whats_left ) )
).
The table statements declare arrays with the specified
bounds, and the equations quantify over these, setting each element of
their left-hand side. The layout statement describes the target
spreadsheet as a grid of arrays, arranging the
arrays next to one another in the order specified. By default,
one-dimensional
arrays run vertically, so the spreadsheet will have year in
column A,
income in column B, and so on.
This way of describing layout separates
appearance from behaviour. For example, to stack the arrays horizontally,
I would allocate one array to each row, and
use the as x annotation which makes them run along the
x axis:
layout( 'Sheet 1', rows( row( year as x )
, row( income as x )
, row( expenses as x )
, row( whats_left as x )
)
).
If Excelsior programs are theories, arrays are functions.
The array identifiers are the operation symbols in the theory's signature.
Each array is a function from its set of indices to
the Excel cell-value type. I should note that
the current version of Excelsior does
force programmers to declare array elements
as one of date, currency, boolean and so on.
This makes
programs more self-documenting, and enables Excelsior to tell Excel
how to format the array elements. Excelsior
does not (yet) typecheck equations against these declarations.
Spreadsheets are theories too, so compiling an Excelsior program to a
spreadsheet
is a morphism from the program theory
to the spreadsheet theory. By the standards of
algebraic programming, it is strange because it
applies coordinate transformations.
A worksheet cell could be seen either as a singleton array,
or as one element of an array that spans the entire
worksheet. I do the latter. The compilation morphism
then becomes even stranger, since it
merges all arrays (i.e. functions) that live on the same
worksheet into one.
2.4 Recursion over Arrays
Arrays as functions lead to novel ways to
program Excel, and increase the variety of modules one can code.
For example, [9] describes
a spreadsheet, downloadable from
[15], that generates science-fiction
plots by recursively
walking a transition network that represents
which plot events can follow one another. This may seem impossible,
as Excel doesn't let you define your
own functions. But by regarding arrays as functions, and
writing recursive equations, one does get recursion. It's
spread out in space rather than time; but it is
definitely recursion.
2.5 Module Inclusion Is Theory Sum
Excelsior text files can be used as top-level programs, or as modules:
both are theories. The include statement, which takes a
filename as argument, includes a module into a program.
Excelsior implements this by summing
the theory contained in the file and the theory
importing it.
Instead of include, I could have provided an
explicit theory-sum operator. However, most common
programming languages have some kind of module inclusion,
and I felt that this would be more familiar to users.
2.6 Renaming Modules, and Including Several Copies
I have, however, adopted another module operation
from the OBJ family: renaming. This combines
theory sum with a renaming morphism.
For example, the program below imports two
copies of a module that splits input strings
at a delimiter. The renaming construct parameterises
the module, mapping types and arrays in it to types and
arrays in the enclosing program.(It also maps constants, which there isn't
room to explain.) The program
instantiates the module twice, so there are two instances
of each of its arrays: these get mapped to
a different location on the worksheet. Here is the module:
/* This module splits a string at a delimiter. It defines:
sep : the delimiter, a constant.
base : the base type of the arrays.
input: the strings to be split.
pos : working storage, position of the delimiter.
pre : result, the part before the delimiter.
*/
constant sep.
type base.
table input : base -> text.
table pos : base -> general.
table pre : base -> text.
pos[ i ] = FIND( sep, input[i], 1 ).
pre[ i ] = LEFT( input[i], pos[i]-1 ).
And here is the enclosing program:
constant delim1 = ",".
type range1 = 1:2.
table data1 : range1 -> text.
data1[1] = "Dijkstra,E.W.".
data1[2] = "Hoare,C.A.R.".
table delpos1: range1 -> general.
table before1: range1 -> text.
include "c:\excelsior\split.exc"
renaming sep to delim1
base to range1
input to data1
pos to delpos1
pre to before1.
constant delim2 = " ".
type range2 = 1:1.
table data2 : range2 -> text.
data2[1] = "Algol 60".
table delpos2: range2 -> general.
table before2: range2 -> text.
include "c:\excelsior\split.exc"
renaming sep to delim2
base to range2
input to data2
pos to delpos2
pre to before2.
layout( 'Sheet 1', rows( row( data1, delpos1, before1 )
, row( data2, delpos2, before2 )
)
).
I have used this in my freelance programming.
Thus, [9] shows a module that
searches for cells
matching a specified pattern and copies them to
an output range. A spreadsheet for modelling social-housing
finances that I worked on needed to
apply this to five cell ranges, to make five lists of
options for dropdown menus. To replicate these calculations
five times in raw Excel would have been tedious, and
a pain when amending them. But
by coding them, and the rest of the spreadsheet, in Excelsior,
I needed only to write five include statements.
2.7 Copying Modules into Existing Spreadsheets
Consider an existing spreadsheet with a "hole" we want to put a module
into.
The parameters to a module affect the shape, size, and position of its
arrays.
So we could write a "stub" program that includes the
module but does nothing else, then compile it to a spreadsheet holding
only the module. By adjusting these parameters, we can shape
it to fit any hole; then copy it to that hole. This, albeit with more
automation, is
how my Spreadsheet Parts Repository [14]
works. There isn't room for an example
here, but I have a demo on my Calco-Tools demonstration page [13].
References
- [1]
-
Burstall, R., Goguen, J.A:
Putting theories together to make specifications.
In: Reddy, R. (ed.)
Proc. V IJCAI.
Cambridge Mass., pp. 1045-1058 (1977),
http://ijcai.org/Past%20Proceedings/IJCAI-77-VOL2/PDF/095.pdf
- [2]
-
Burstall, R., Goguen, J.A:
The semantics of Clear, a specification language.
In: Bjorner, D. (ed.)
Proc. Copenhagen Winter School on Abstract Software Specification.
LNCS, vol. 86, pp. 292-332.
Springer, Heidelberg (1980)
- [3]
-
EuSpRIG Horror Stories.
EuSpRIG Web page,
http://www.eusprig.org/horror-stories.htm
- [4]
-
Goguen, J.A:
Sheaf Semantics for Concurrent Interacting Objects.
Mathematical Structures in Computer Science,
2, 159-191
(1992),
http://www.cs.ucsd.edu/~goguen/ps/sheaf.ps.gz,
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.4296
- [5]
-
Goguen, J.A, Tracz, W.:
An Implementation-Oriented Semantics for Module Composition.
In: Leavens, G., Sitaraman, M. (eds.)
Foundations of Component-based Systems,
pp. 231-263.
Cambridge University Press (1997),
http://cseweb.ucsd.edu/~goguen/ps/will.ps.gz,
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.3793
- [6]
-
Goguen, J.A., Winkler, T., Meseguer, J., Futatsugi, K., Jouannaud, J-P.:
Introducing OBJ (1993),
http://cseweb.ucsd.edu/~goguen/pps/iobj.ps,
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.9983
- [7]
-
Hayes, F.:
Lehman-Barclays slip shows perils of data transfer.
Computerworld
(27 October 2008),
http://computerworld.co.nz/news.nsf/mgmt/6F029D457D594675CC2574EC00010F62
- [8]
-
Ireson-Paine, J.N.:
Ensuring Spreadsheet Integrity with Model Master.
In: Proceedings of EuSpRIG 2001,
http://www.j-paine.org/eusprig2001_as_html/eusprig2001.html
- [9]
-
Ireson-Paine, J.N.:
Spreadsheet Components For All.
In: Proceedings of EuSpRIG 2008,
http://www.j-paine.org/eusprig2008.html
- [10]
-
Ireson-Paine, J.N.:
Spreadsheet Components, Google Spreadsheets, and Code Reuse.
Copy of Dr. Dobbs blog posting
(March 2009),
http://www.j-paine.org/dobbs/spreadlets.html
- [11]
-
Ireson-Paine, J.N.:
Gliders, Hasslers, and the Toadsucker: Writing and Explaining a Structured
Excel Life Game.
Copy of Dr. Dobbs blog posting
(March 2009),
http://www.j-paine.org/dobbs/life.html
- [12]
-
Ireson-Paine, J.N.:
How to Document a Spreadsheet: an Exercise with Cash-Flow and Loans.
Copy of Dr. Dobbs blog posting
(March 2009),
http://www.j-paine.org/dobbs/loans.html
- [13]
-
Ireson-Paine, J.N.:
Demonstrations for Calco-Tools 2011
(March 2011),
http://www.j-paine.org/calco2011/demos.html
- [14]
-
Ireson-Paine, J.N.:
Ireson-Paine Spreadsheet Parts Repository home page,
http://www.spreadsheet-parts.org/
- [15]
-
Ireson-Paine, J.N.:
Excel science-fiction generator.
Ireson-Paine Spreadsheet Parts Repository Web page,
http://www.spreadsheet-parts.org/sf.html
File translated from
TEX
by
TTHgold,
version 4.00.
On 30 Mar 2011, 13:14.