Stylechecking spreadsheets with Model Master

[ Jocelyn Ireson-Paine's Home Page | Publications ]

A prospective user of Model Master recently asked me to show them how it could be used to check spreadsheets against their rules for house style. I thought this might interest other readers, so have put the demo on the Web. The rule in question was that no spreadsheet should contain, in the same worksheet, more than one occurrence of the same formula. For example, a spreadsheet for plane-geometry calculations should not contain more than one copy of the formula

  =SQRT( POWER(A1,2) + POWER(A2,2) )
Their motivation was that complicated formulae should be localised in one place so that if they have to be changed, it isn't necessary to find and change more than one copy. In most programming languages, we'd overcome this by defining functions, but Excel doesn't allow this.

Automatic validation

Back to the demo. First, this is the tiny example spreadsheet to be validated. It contains three sheets, each with a few duplicated formulae. Some of these are just numbers, the rest are simple additions.

Next, here is a report of the duplicate formulae in this spreadsheet. Outputting as HTML is nice, because of its versatility for generating attractive and readable output. The duplicate formulae were found with MM's spreadsheet-access predicates. MM is coded in Prolog, and in Prolog, a formula is just a tree. Comparing two formula for exact equality takes one call; more sophisticated comparisons, e.g. for structural similarity, can easily done by recursively decomposing the formulae and matching subexpressions.

We can make another way to show the duplicates. Here, each worksheet is displayed as a table with the duplicate formulae set upon a coloured background. The backgrounds are colour-coded, all occurrences of the same formula sharing the same colour. This shows that we can generate output other than textual listings.

For very big spreadsheets, such a display loses its attractiveness, because the tables would be illegibly large. However, we could use the same information to colour the cells in the spreadsheet itself. This output file shows how we might do so. Let's assume that MM is running as a subprocess of a Visual Basic program itself invoked from the spreadsheet under test. Then it could send the validation results back to VB, either in a scratch file or down the subprocess output stream. VB could then read this and recolour the spreadsheet accordingly. In our output, each line begins with an integer denoting a colour. This is followed by a worksheet name, and finally by a sequence of cell addresses within that worksheet. These are the addresses of cells sharing the same formula. All VB has to do is to allocate a unique colour to each integer, and then colour all the cells on that line with this colour.

And just for reference, this is a listing of the entire spreadsheet.

Automatic rebuilding

Now, a really nice thing about MM is that it can not only detect violations, but correct them. Let's suppose we decide on a simple fix, where any formula that occurs more than once is moved to an unused cell at the end of the spreadsheet, and all occurrences are replaced by references to that cell. Difficult? No, and here is the example spreadsheet after such a fix. In some sense, this is the same spreadsheet as the original, in that it calculates the same results. It's therefore not surprising that we can use spreadsheet algebra to state an invariant and convert one into the other.

[ Jocelyn Ireson-Paine's Home Page | Publications ]