[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

How to Document a Spreadsheet: an Exercise with Cash-Flow and Loans

Following Gliders, Hasslers, and the Toadsucker: Writing and Explaining a Structured Excel Life Game, I've been trying more structured spreadsheet programming. This time, a cash-flow spreadsheet which defines several loans — imagine them as credit cards — and, each month, selects the best loan for the amount the user wants to borrow. I'm pleased with it, because it came out as really nice functional code, and the spreadsheet neatly depicts the loan-selection mechanism, Excel tables complementing a textual program. And, as an antidote to the horror story below, I think it shows how one should document a spreadsheet.

How to document spreadsheets

I once worked with a huge great 400,000-cell hulk of a spreadsheet. It had been designed by administrators at a certain university, and was so big that the University Surveyor moved its users to the ground floor to avoid them falling into the room below. And it had no documentation to explain any of its cells. I tell a lie — three cells involved with validating the calculations did have comments. As for the rest, the documentation was an email or two, plus three pages of notes taken during a meeting. Some people don't want the world to know how their programs work.

That is why I'm posting this article. The loans spreadsheet is an example I programmed in my Excelsior spreadsheet-generator — free download here and explained in Gliders, Hasslers, and the Toadsucker— as a rational reconstruction of an Excel project-planner a customer had written. I've simplified it in order to demonstrate principles; it would need enhancement for real-world use. The loans, for example, don't have interest or monthly repayments. But the Excelsior code and my commentary, do show how one should document a spreadsheet. Even, I want to emphasise, if you're not using Excelsior.

Why?

First, because in Excelsior, you program in terms of tables. These act much as functions do in other languages, and you can give them meaningful names such as "total_cash_at_end_of_period". If you do so, you can write code such as:

total_cash_at_end_of_period[ t ] =
  total_cash_at_start_of_period[ t ] - expenses_during_period[ t ].

Such code is clear on its own. But secondly, by writing comments, you can precisely specify your intent. For example, when later on I describe the data that defines loans, I write of the "ceiling" and "has_ceiling" tables:

// ceiling[l] is loan l's ceiling. That is, the user can borrow
// any amount B such that B ≤ ceiling[l]. If
// not( has_ceiling[l] ), the user can borrow without limit.
That seems to me even clearer.

But what if you don't want to code in Excelsior, but to develop directly in Excel?

Well, you should still write comments. Choose a meaningful name for each group of logically-related cells, and either assign it as a named range, or indicate in some other way which cells it names. Then you can use it in comments such as that above for "ceiling", and you can type those into Excel as cell comments, or write them in a separate program-specification document. And you can use Excelsior code, or whatever similar notation you fancy, as pseudo-code. Transcribe difficult formulae into it, and include the result in your comments. That will make them much easier to understand than if the reader has to dissect them out of the formula bar.

Why not?

Modelling cash flow

Now I'll proceed with the spreadsheet, starting with the easiest part, the cash-flow tables. Let's assume we want to forecast cash flow over monthly periods, displaying two tables. One shows total cash at the start of each period; one, total cash at the end of each period. It's the recession, so let's assume there's no income, only expenses. Then total cash at the end of each period equals total cash at the start of each period minus expenses during the period. We might write this mathematically as:

total_cash_at_end_of_period(t) =
  total_cash_at_start_of_period(t) + expenses_during_period(t).

Here, I'm viewing total_cash_at_end_of_period, total_cash_at_start_of_period and expenses_during_period as functions that map a period indicator t to money. It doesn't matter what t is as long as it uniquely identifies periods. For example, it could be an integer index, with 1 meaning 1st January 2009 to 31st January 2009, 2 meaning 1st February 2009 to 28th February 2009, and so on.

Excelsior uses a similar notation, explained in Gliders, Hasslers, and the Toadsucker.

In this notation, the three identifiers are the names of "tables", which Excelsior will map to groups of cells on worksheets. We must state the equations that define how the tables' elements are calculated, and also say how big the tables are and which values their indices range over. Doing so, and adding two equations for cash at the start of a period, I get:

type time_range = 1:12.

table time : time_range -> date.
//
// time[t] is the date of the first day of
// period t. The final day of period t is
// the final day of its month, which is
// calculated below.

time[ t ] =
  date( 2009, t, 1 ).


table initial_cash : -> currency.
//
// initial_cash[] is the "opening cash balance",
// the value for the first period's total_cash_at_start_of_period.

table expenses_during_period : time_range -> currency.
//
// expenses_during_period[t] is the expenses incurred during
// period t: during the first day to the final day inclusive.

table total_cash_at_start_of_period : time_range -> currency.
table total_cash_at_end_of_period : time_range -> currency.
//
// total_cash_at_start_of_period[t] is the total cash
// held at the beginning of the first day of period t.
// Similarly, total_cash_at_end_of_period[t] is the total cash
// held at the end of the final day of period t.

total_cash_at_start_of_period[ 1 ] =
  initial_cash[].

total_cash_at_start_of_period[ t>1 ] =
  total_cash_at_end_of_period[ t-1 ].

total_cash_at_end_of_period[ t ] =
  total_cash_at_start_of_period[ t ] - expenses_during_period[ t ].

In this code, there are five tables and four equations. The first table, "time", is like a function that maps a period indicator to a date: t varies from 1 to 12, and the date for any t is the first day of the t'th month of 2009. This equation calls Excel's "date" function: for example, date(2009,2,5) is 5th February 2009. I don't use this table in the calculations but to generate captions for the other tables. I'll explain this when I talk about layout.

I hope the other tables are clear from their names and the comments. Notice how careful I've been to say explicitly how the tables relate to the time periods. This is important! How many of us have coded half a cash-flow forecaster, only to discover an off-by-one error in our indexing, with the effect that — for example — cash for month m equals cash for month m minus expenses for month m. To which the only solution is that cash be either infinity or minus infinity: neither physically possible, even if the latter often feels as though it is.

That's all the equations and tables I need for this first spreadsheet, so let's see what it looks like. This is the spreadsheet, and here is a picture of it:

In this picture, the money values are at most £100. That's merely to narrow the cells so I can squeeze this picture into the Dobbs blog column width. For the same reason, I've shortened the dates to DD/MM/YY format.

To get the spreadsheet, I also had to tell Excelsior how to lay the tables out. But because this posting is about documenting calculations, I'll omit the details. You can see my layout statements in the first program file, cash1.exc, linked in the next paragraph. One point I will make is that these statements tuck the "time" table out of the way on a sheet called Time. Whenever I want to use the dates in it as captions, I copy them with a "copy(time)" argument in a layout statement. I explain about "copy" in my Excelsior installation and user guide; essentially, it separates how captions are calculated from how they are displayed. As Phil Bewig points out in his paper How do you know your spreadsheet is right? Principles, Techniques and Practice of Spreadsheet Style, separating calculation from display is a Good Thing.

If you want to try generating your own spreadsheet, download Excelsior from here and install it as described in Gliders, Hasslers, and the Toadsucker. Then download cash1.exc and cash_data.exc into the directory you installed into. Then obey the following command, but with c:\loans\ replaced by the directory you installed into:

excelsior -t c:\loans\template.xlt cash1.exc cash_data.exc
The second file, incidentally, contains test data for the initial cash and the monthly expenses. Excelsior merges these before compilation, and this merging allows test data to be kept separate from the rest of your program. This is a simple kind of modularisation, which allows the same program to tested with many different datasets. That also is a Good Thing.

Adding loans to the program: the interface with cash flow

I'm now going to add loans to my program. As a first step, I'll incorporate a borrowing term into the total_cash_at_end_of_period equation:

total_cash_at_end_of_period[ t ] =
   total_cash_at_start_of_period[ t ] - expenses_during_period[ t ] +
   actually_borrowed_during_period[ t ].

I'll also declare two more tables. Into one of them, the user types the amount they want to borrow in any period. The other, actually_borrowed_during_period, will hold the amount the loans allow them to borrow. In this first-step version, I'll make the two equal:

table want_to_borrow_during_period : time_range -> currency.
//
// want_to_borrow_during_period[t] is the amount
// the user wants to borrow during period t.

table actually_borrowed_during_period : time_range -> currency.
//
// actually_borrowed_during_period[t] is the amount
// the loans enable the user to borrow, taking
// borrowing limits into account. This will be less than
// or equal to want_to_borrow_during_period[t].

actually_borrowed_during_period[ t ] =
  want_to_borrow_during_period[ t ].

In the image below, you can see I've typed amounts of £20 and £10 into want_to_borrow_during_period. Notice that these are mirrored in actually_borrowed_during_period:

If you want to try yourself, download cash2.exc and compile it like this:

excelsior -t c:\loans\template.xlt cash2.exc cash_data.exc

Adding loans to the program: the loans themselves

Now, I'm going to add the loans stuff and modify the way actually_borrowed_during_period depends on want_to_borrow_during_period. First, because I no longer want them to be equal, I remove that equation I added:

actually_borrowed_during_period[ t ] =
  want_to_borrow_during_period[ t ]
.
This equation is in the file cash2.exc that I just mentioned, but it won't be in the version I'm going to work with from now on, cash3.exc.

And secondly, I implement the loans. We shall have a fixed number of these. Each loan is like a credit card: you can borrow at any time, as long as the total you have borrowed doesn't exceed a ceiling. In the spreadsheet that inspired mine, there was also a "catch-all" loan with no ceiling: you could, in theory, borrow an infinite amount. I'll allow such loans too. So we need to define the number of loans, the ceilings of those that have ceilings, and a flag to say which do and which don't:

constant no_of_loans = 4.

type loans_range = 1:no_of_loans.

table has_ceiling : loans_range -> boolean.
//
// has_ceiling[l] is true if loan l has a ceiling.

table ceiling : loans_range -> currency.
//
// ceiling[l] is l's ceiling. That is, the user can borrow
// any amount B such that B ≤ ceiling[l]. If
// not( has_ceiling[l] ), the user can borrow without limit.
I can change no_of_loans to anything I want, so I could have six loans or a hundred and twenty-three. This is the spreadsheet, and here is the sheet that contains these tables:

Do you see how I made has_ceiling distinct from the ceiling? I prefer this over a "nonsense value" convention such as that if a ceiling is -1 or #N/A, then it is infinite. Spreadsheeters often do rely on such nonsense values, but using a separate group of cells produces code that's easier to read and modify.

Now that I've described the parameters defining a loan, I shall model the loans' progress through time. First, I need an initial value, because although my cash-flow forecast starts at the date held in time[1], the loans could have been arranged and borrowed from before that. So I shall declare a table analogous to initial_cash. Actually, you'll have seen it in the picture above:

table initial_loan : loans_range -> currency.
//
// initial_loan[l] is the total already borrowed
// from loan l at time[1].

Now I define tables analogous to my total_cash tables. These record the progress of each loan over time:

table total_loan_at_start_of_period : loans_range time_range -> currency.
table total_loan_at_end_of_period : loans_range time_range -> currency.
//
// total_loan_at_start_of_period[l,t]
// is the total lent by loan l at the start of
// period t, i.e. the total borrowed from that loan.
// Similarly, total_loan_at_end_of_period[l,t]
// is the total lent by loan l at the end of
// period t.

total_loan_at_start_of_period[ l, 1 ] =
  initial_loan[ l ].

total_loan_at_start_of_period[ l, t>1 ] =
  total_loan_at_end_of_period[ l, t-1 ].

total_loan_at_end_of_period[ l, t ] =
  total_loan_at_start_of_period[ l, t ] + lent_during_period[ l, t ].

The equation for total_loan_at_end_of_period adds in a value for lent_during_period. This is the core of my spreadsheet, for it's here that we record which loan lends during each period, and how much. I'll get on to that later; for the moment, I'll just declare the table and comment it:

table lent_during_period : loans_range time_range -> currency.
//
// lent_during_period[l,t] is the amount lent by
// loan l in period t.

Before explaining how lent_during_period works, I'm going to jump to another part of the calculation. This is where we decide which loans can be borrowed from at any time:

table can_supply_wants : loans_range time_range -> boolean.
//
// can_supply_wants[l,t] is true if loan l is capable of lending
// what the user wants to borrow at period t. This
// does not imply that we will in fact use the loan.

can_supply_wants[ l, t ] =
  or( not( has_ceiling[l] )
    , want_to_borrow_during_period[t] + total_loan_at_start_of_period[l,t] <= ceiling[l]
    ).
I like to read this as: can_supply_wants is a table of Boolean (logical) values. The element can_supply_wants[l,t] is true if l has no ceiling, or if the amount the user wants to borrow, plus the amount lent at the start of the period, is less than or equal to l's ceiling. As with "date" in my definition of "time", "if" and "or" are the Excel functions.

This table tells us which loans the user can borrow from, but how do we decide which they will in fact borrow from? To keep things simple, let's assume it's the first eligible loan, the one with the smallest l. I realised that, because I arrange can_supply_wants with time running vertically, and l going from left to right with no gaps — as we'll see when I discuss layout — I can find the first "true" by scanning with Excel's "match" function:

table first_that_can_supply_wants : time_range -> general.
//
// first_that_can_supply_wants[l,t] is the first l
// for which can_supply_wants[l,t] holds.
// It is 1 if the user wants to borrow zero in period t
// (this is consistent), and #N/A if can_supply_wants[l,t]
// is false for all l, i.e. if no loan can supply what
// the user wants.

first_that_can_supply_wants[ t ] =
  match( true[], can_supply_wants[ all, t ], 0 ).
Here is a picture of first_that_can_supply_wants, which I've laid out next to the cash tables. If you compare it with the ceilings above, you'll see that the first loan with sufficient funds has been selected each time:

And here is the table that first_that_can_supply_wants matches against, can_supply_wants, with its trues and falses. You can also see lent_during_period, showing the amount each loan has lent, and when:

There is one little trick here. I was going to make Excelsior recognise the names "true" and "false" as built-in constants denoting the Excel values of the same name. But when writing this, I realised I'd forgotten to. So as a hack, I've defined a table whose only element is the result of evaluating 1=1, which of course is true:

macro table true : -> boolean.
//
// true[] is the Boolean value "true".

true[] = 1=1.

The table "true" is a special kind of table, that I call a "macro table". If the right-hand side of an equation references normal table elements, Excelsior replaces these by the equivalent cells or cell ranges, then passes the resulting formula to Excel. But it treats the elements of macro tables differently, just by replacing the element reference by the right hand side of the equation defining them. So in this program, Excelsior will replace any reference to true[] by 1=1. You can see this in the formula bar, displaying the formula in first_that_can_supply_wants[1]:

The word "macro", incidentally, has the same origins as in "Excel macro". In the early days of assembler programming, a macro was a command that looked like a single assembler instruction, but that the assembler actually expanded into a sequence of instructions. The years have widened its usage, "Excel macro" being a case of this. Similarly, I use the word because Excelsior expands references to macro tables, making subformulae to be included within a complete formula.

Let us now return to first_that_can_supply_wants and link it to lent_during_period. I said earlier that lent_during_period[l,t] is the amount lent by loan l in period t. Now I can calculate it:

lent_during_period[ l, t ] =
  if( isna( first_that_can_supply_wants[t] )
    , 0
    , if( l = first_that_can_supply_wants[ t ]
        , want_to_borrow_during_period[ t ]
        , 0
        )
    ).
I shall read this as: the amount lent by loan l in period t is 0 if first_that_can_supply_wants[t] is #N/A: that is, if the match found no trues. Otherwise, if first_that_can_supply_wants[t] is l, it is the amount the user wants to borrow. Otherwise it is 0.

Ich am eldre and ek magti! At last, I can define actually_borrowed_during_period:

actually_borrowed_during_period[ t ] =
  sum( lent_during_period[ all, t ] ).
This is just the sum over all loans of the amount lent during period t.

As with the earlier spreadsheets, I shan't discuss the layout statements. You can find them in these files below, from which you can compile this spreadsheet. They are: cash3.exc; loans.exc; and loans_data.exc. Compile them like this:

excelsior -t c:\life\template.xlt cash3.exc loans.exc cash_data.exc loans_data.exc

Excel as a programmable crystal

One final picture, of all the tables on the Loans sheet. I winched the number of loans up to 100, the number of periods up to 120, and defined the data so each loan would support one lend: