To teach a friend how compilers work, I've written a compiler for a small subset of Pascal. It lexically analyses and parses a program, code-generates the syntax tree into code for a stack machine not unlike the Ocode virtual machine once used by C's ancestral language BCPL, and interprets the code. I'm going to write a few essays about the compiler and its target machine; in this first one, I'll demonstrate how a stack machine evaluates expressions.

A good place to start is reverse Polish. As many readers will know, this is a notation for expressions whereby all operators and functions follow their arguments. It's important because once we have the reverse Polish form of an expression, we can trivially transcribe it into a program that evaluates the expression.

Here's a very simple example. The reverse Polish form of the expression

11 * 22is

11 22 *

This works with any number of operators. So, for example, the reverse Polish form of the expression

11 * 22 + 33is

11 22 * 33 +

Notice that this is the reverse Polish for `E + 33`

, which
is `E 33 +`

, with `E`

replaced by
the reverse Polish for `11 * 22`

. Reverse
Polish has this lovely recursive structure.

What's magic about this is that we can transcribe any reverse-Polish expression into a machine-code program that evaluates the expression. Providing, that is, that our machine has a stack.

So,
assume a stack. Let
there be a `loadconst`

instruction,
which
pushes a constant onto the stack. And
let there be an `add`

instruction, which pops the top
two elements off the stack, adds them, and pushes
the result back. Let `mult`

operate
similarly. And let us transcribe the reverse
Polish above, replacing constants by
`loadconst`

, and renaming operators
the corresponding instruction.

Doing this,

11 22 * 33 +becomes

loadconst 11 loadconst 22 mult loadconst 33 add

I fed the instruction sequence above to my stack-machine interpreter, and this is what it displayed. The stack, which starts off empty, is shown as a list, with the topmost element on the left:

Stack: [] About to do OP(loadconst, 11) Stack: [11] About to do OP(loadconst, 22) Stack: [22, 11] About to do OP(mult) Stack: [242] About to do OP(loadconst, 33) Stack: [33, 242] About to do OP(add) Stack: [275]

Here's a bigger example. I'm going to evaluate this expression:

((11 * 22 + 33 * 44) / 121 = 14) and (10 = 20 / 2)

And here is the same expression rewritten in reverse Polish. The line breaks have no significance, they just stop me overflowing the blog's right-hand margin:

11 22 * 33 44 * + 121 / 14 = 10 20 2 / = and

This is the reverse Polish transcribed into machine instructions:

loadconst 11 loadconst 22 mult loadconst 33 loadconst 44 mult add loadconst 121 div loadconst 14 eq loadconst 10 loadconst 20 loadconst 2 div eq logand

I've used a greater variety of
instructions here. The `div`

instruction divides the element next to
the top of the stack by the element on the
top, popping them and
pushing back the integer
quotient.
The `eq`

instruction
compares the top two elements, pops them, and
pushes back 1 (signifying TRUE) if they are equal, 0
(signifying FALSE)
otherwise. And the `logand`

instruction does a logical "and", popping
the top two elements and pushing
back 1 if they are both 1, otherwise 0.

Now here is my interpreter interpreting those instructions:

Stack: [] About to do OP(loadconst, 11) Stack: [11] About to do OP(loadconst, 22) Stack: [22, 11] About to do OP(mult) Stack: [242] About to do OP(loadconst, 33) Stack: [33, 242] About to do OP(loadconst, 44) Stack: [44, 33, 242] About to do OP(mult) Stack: [1452, 242] About to do OP(add) Stack: [1694] About to do OP(loadconst, 121) Stack: [121, 1694] About to do OP(div) Stack: [14] About to do OP(loadconst, 14) Stack: [14, 14] About to do OP(eq) Stack: [1] About to do OP(loadconst, 10) Stack: [10, 1] About to do OP(loadconst, 20) Stack: [20, 10, 1] About to do OP(loadconst, 2) Stack: [2, 20, 10, 1] About to do OP(div) Stack: [10, 10, 1] About to do OP(eq) Stack: [1, 1] About to do OP(logand) Stack: [1]

The two equality comparisons both returned TRUE, that is 1; and the logical "and" combined these, pushing a final 1 onto the stack.

In my next essay on this topic, I'll extend the machine to implement variables, jumps, and input and output. By the way, it has not been unknown for people to code directly in reverse Polish — as The Evolution of RPN & Numeric Entry in Hewlett-Packard's The Museum of HP Calculators tells us.