[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

Programs That Transform Their Own Source Code; or: the Snobol Foot Joke

There is a collection of jokes at various places on the Web, for example Susan Stepney's How to Shoot Yourself In the Foot, that invites you to compare a C program for shooting yourself in the foot with programs in other languages. The C program is straightforward; those in the other languages have extravagantly elaborate effects. Except Basic, which manages only to soak your trousers with a water pistol. I was reminded of these jokes by Christopher Yeleighton's reply What about reflexion? to my New Year's resolutions about how to choose a programming language. Christopher asked whether I would use a language where I can't determine the contents of a function at run time. I do use such languages, but that's not interesting, so I'm going to show you how I program in one where I (almost) can. It's a beautiful and wonderfully mutable language called Snobol4, which enables you to pump out new statements at run time like a sausage-maker gone berserk, except that sausages are not executable. I hope that my article will teach you enough Snobol to understand the Snobol version of the foot joke.

In fact, Snobol doesn't allow you to determine the contents of a function at run time. But it lets you do something almost as good, because you can put the source code of the function into a string, then compile that. Which gives you both the function, and its source code. As a study in Snobol, I'm going to show you how the Snobol expert James Gimpel used this feature in his book Algorithms in Snobol4, to define a function for defining other functions. This function was named DEXP, short for "define function as expression", and you called it like this:

DEXP( "AVE(X,Y) = (X + Y) / 2.0" )
This defined a function named AVE which returned the average of X and Y. When you see how one normally codes functions in Snobol, you'll appreciate why DEXP was so useful.

Perhaps the capitals make it obvious that Snobol is an old language. The most recent version, Snobol4, was started in 1966, as the SNOBOL4.ORG -- SNOBOL History explains. Incidentally, Snobol was once called Sexi, String Extraction Interpreter. The name Snobol was apparently chosen after an implementor remarked that the language didn't have a snowball's chance in Hell. It's more fun than naming languages after coffee.

Old or not, Snobol has some interesting features. The most important is pattern matching, which provides concise notation for selecting and extracting parts of strings. I still prefer it to Perl regular expressions. Indeed, Snobol patterns are computationally more powerful, because they can be recursive, which regular expressions can't. Actually, I wrote a Snobol program for massaging some medical data only last week, running it on Catspaw Inc.'s free Snobol. I'll show you later in the article how to install this.

I'm now going to explain what you need to know in order to understand DEXP. Because Snobol is so unusual, I'll take this slowly. If you want to know more, I'd recommend Mark Emmer's A Snobol4 Tutorial.

First, Snobol is not free-format. If a statement begins with a label, that label must start in its first column. If the statement begins with an asterisk, it's a comment card. (And when was the last time you saw mention of cards in Dobbs Code Talk?) Otherwise, it must start with at least one space.

Second, a jump isn't a command as in most languages, it's part of a statement. Every statement has slots for either one or two jumps, which are always at the end of the statement. A jump to L is written ":(L)". So this code displays 1 and then 3:

  output = 1
  :(L)
  output = 2
L
  output = 3
The solitary ":(L)" is a statement that has only a jump slot but no body, and the solitary "L" is a statement that has only a label but no body. I could have attached these to neighbouring statements:
  output = 1 :(L)
  output = 2
L output = 3

This demonstrates a third point, output-associated variables. The statement "output = 1" is an assignment, but to a special variable that is, as Snobol programmers say, output-associated. It's connected to standard output, and when you assign to it, it displays the value assigned on a new line.

Because I'll need it in a later example, I'll now show you how to write loops. Snobol doesn't have curly brackets, begin-end keywords, or other such delimiters; and it doesn't have the predefined control structures one associates with them. Loops are something you must build for yourself. Which is good, because it gives me an excuse to demonstrate success and failure.

I'll start with an infinite loop:

  i = 1
loop
    i = i + 1
    output = i
    :(loop)
end
There's nothing new here, except to note that Snobol syntax requires binary operators such as "+" to be surrounded by spaces.

Next, here's a bounded loop:

  i = 1
loop
    gt( i, 10 ) :s( out )
    output = i
    i = i + 1
    :( loop )
out
  output = "Ended the loop."
end

This shows how to write conditionals. The "gt" is Snobol for "greater than". (Which is nice. It means I don't have to run through this article escaping greater-than signs for HTML.) It's a predicate, which can either succeed or fail. If it succeeds, Snobol will execute the statement's success goto. If it fails, Snobol will execute the statement's failure goto. If the statement lacks the appropriate one of these, control will just transfer to the next statement.

You'll have guessed that the "s" indicates ":s( out )" to be a success goto. So this is one way to jump out of a loop.

Having explained the basic concepts, I'll move on to function definition. In Snobol, function definition happens at run time, and is done by calling the "define" function. This takes a function prototype string as its first argument, and optionally a label as its second, and makes the label be the entry point for the function named in the prototype. If the optional label is omitted, it's assumed to have the same name as the function. Here's an example that defines f to be a function that, when called, jumps to label f. In other words, its body begins at f. That's the label f and not the function f, of course, because in Snobol, f can simultaneously name a label, a function, and a variable:

  define( "f(x)" ) :(f_end)
f
  f = x * x
  :(return)
f_end

  output = f(2)

end

Although "f_end" looks as though it has some special association with f, it doesn't. I put it there merely to implement a hop-around so that control doesn't fall into f. There is absolutely no compile-time distinction between the body of f and the rest of the program: the distinction only becomes apparent at run time, when Snobol executes the "define". Even then, the distinction is only dynamic, given by the built-in ":(return)" that returns from f.

To emphasise this, I'll demonstrate that I can call "define" more than once on the same function. Each time I call it, it defines the function anew:

  define( "f(x)" ) :(f_end)
f
  f = x * x
  :(return)
f_end

  output = f(2)
  
  define( "f(x)", "new_f" ) :(new_f_end)
new_f
  f = x * x * x
  :(return)
new_f_end

  output = f(2)

end

I've tested all my examples using Catspaw's free Snobol, which I installed on XP after downloading from the Snobol FTP site mentioned in the "SNOBOL4 FTP Site" section of Catspaw's SNOBOL4 and SPITBOL Information. Download snobol4p.zip and unzip into a clean directory. Copy my programs from this article, or download them from this zip file. This is Catspaw running the previous example:

C:\dobbs>\snobol4\SNOBOL4.EXE fredef.sno

Snobol+ Version 2.24. 8087 present.
(c) Copyright 1984,1998 Catspaw, Inc., Salida, Colo. All Rights Reserved.

No errors

4
8

Notice in my second call to "define", by the way, that I passed the label g as the string "g". Snobol has an unusual kind of dereferencing that converts strings to variables.

You can now appreciate with how much care one must define functions. If you want the call to "define" to appear next to its function's body — which you do, because otherwise the list of formal parameters will be a long way from the body, making it hard to read — you must remember to program a hop-around. You have to choose a suitable name for the hop-around label, and to jump to it, and you also need to label the function's entry point.

One way round this is a preprocessor. Back in the days of Fortran 66, there was a famous preprocessor called Ratfor, which translated structured Fortran with control structures into unstructured Fortran without. Here's the Ratfor example from 99 Bottles of Beer: one program in 1314 variations:

integer beers
integer part

beers=99
part=0

repeat
{
  if(part==0 || part==3)
  {
   if(beers==1)
    print *,'1 bottle of beer on the wall'
   else
     print *,beers,' bottles of beer on the wall'
     
    if(part==3)
      print *,'-'
 }
 else if(part==1)
 {
   if(beers==1)
    print *,'1 bottle of beer'
   else
     print *,beers,' bottles of beer'
 }
 else
 {
    print *,'take one down, pass it around'
    beers = beers - 1
 }
  part = Mod( part+1, 4 )
  
} until (beers==0 && part==0)

end
You can do the same with Snobol, as Andrew Koenig did with his Snocone preprocessor.

But if you can invoke your compiler at run time, another way round the problem is to use it plus string-hacking as a preprocessor. Which is what James Gimpel's DEXP function did. Here's his example again:

DEXP( "AVE(X,Y) = (X + Y) / 2.0" )

This defines a function AVE which returns the average of X and Y. But how does it work? There are three stages. First, using pattern matching, DEXP separates the AVE(X,Y) from the expression (X + Y) / 2.0. It converts the AVE(X,Y) into a function prototype, and the expression into a function body, both stored as strings. Second, it compiles the function body by passing it to the built-in "code" function. Third, it calls "define". This defines the new function, hooking it up to a label which DEXP has prepended to the function body.

Here's the definition of DEXP, rewritten in lower case for readability:

  define( 'dexp(proto)name,args' ) :(dexp_end)
dexp
  proto pos(0) span(' ') =
  proto break('(') . name bal . args = name
  code( name ' ' proto ' :s(return)f(freturn)' )
  define( name args )
  :(return)
dexp_end

  dexp( 'ave(x,y) = (x + y) / 2.0' )

  output = ave( 3, 4 )

end

Running it gave me this output:

C:\dobbs>\snobol4\SNOBOL4.EXE dexp.sno

Snobol+ Version 2.24. 8087 present.
(c) Copyright 1984,1998 Catspaw, Inc., Salida, Colo. All Rights Reserved.

No errors

3.5

I ought now to explain pattern matching and the "code" function. I'll do pattern matching first. The body of DEXP contains two pattern-matching statements:

  proto pos(0) span(' ') =
  proto break('(') . name bal . args = name
Both contain a subject string and a pattern, separated by a space. One problem with Snobol is that it uses space for several different purposes. In these statements, the space after "proto" separates the subject from the pattern. The spaces after "pos(0)" and after "break('(') . name", however, denote concatenation. They join two subpatterns together to make a composite pattern.

The first component of the first pattern, "pos(0)", tells the pattern matcher to try matching only at the first character in "proto". This is needed because Gimpel was writing his code to work in unanchored mode, wherein the matcher tries matching at the first character, the second, the third ... and so on until the match succeeds. The second component, "span(' ')", matches a sequence of spaces. And the "=" means that if the match succeeds, the matcher must replace the substring matched by nothing. So the match replaces any sequence of spaces starting at the first character by nothing. That is, it removes leading spaces from "proto".

In the second pattern, there are two components: "break('(') . name", and "bal . args". Here, "name" and "args" are variables. The dot is the conditional assignment operator. "A . B" tells the matcher that if the entire match of which pattern A is a part succeeds, then the substring matched by A must be assigned to variable B. So here, if the entire match succeeds, the substring matched by "break('(')" must be assigned to "name". Similarly, the second part of the pattern directs that the substring matched by "bal" must be assigned to "args".

What, then, do "break('(')" and "bal" do? Well, the "break" scans up to an opening bracket. And the "bal" matches any non-empty string which is balanced with respect to brackets. Like "pos" and "span", these are built-in pattern-valued functions. So consider executing:

  proto break('(') . name bal . args = name
where "proto" is:
  AVE(X,Y) = (X + Y) / 2.0
The "break" matches AVE and assigns it to "name". The "bal" matches the argument list "(X,Y)" and assigns it to "args". Then the "=" replaces these matched substrings by "name". This makes "proto" be:
  AVE = (X + Y) / 2.0
For more on pattern matching, see PATTERNS AND PATTERN FUNCTIONS in A Snobol4 Tutorial.

DEXP then obeys the statements:

  code( name ' ' proto ' :s(return)f(freturn)' )
  define( name args )
(The space between subexpressions here is concatenation, as it was between pattern components.) With the values of the variables calculated above, these statements become:
  code( "AVE AVE = (X + Y) / 2.0 :s(return)f(freturn)" )
  define( "AVE(X,Y)" )
So "code" receives a string that looks like a Snobol statement starting with the label AVE, containing the body "AVE = (X + Y) / 2.0", and ending with the jumps ":s(return)f(freturn)". And the "define" defines function AVE to have the entry point AVE.

The jumps, by the way, mean that if the body succeeds, the function should return and itself succeed. If the body fails, the function should return and itself fail. With that in mind, if you look back at the definitions of "dexp" or "f" or "new_f", you can see how the argument to "code" resembles a function body.

So I merely need to add that "code" takes a string and compiles it. But how do you execute the result? Answer: via its initial label. Executing "code" redefines all the labels in the argument so that they point into the new code. Isn't that wonderful?

I'm going to demonstrate with the following program:

  :(L)
M
  code( "L output = 'At second L.' :(N)" )
  :(L)

L
  output = "At first L."
  :(M)

N
  output = "About to end."
end

Can you see what it does? Here's the output:

C:\dobbs>\snobol4\SNOBOL4.EXE code.sno

Snobol+ Version 2.24. 8087 present.
(c) Copyright 1984,1998 Catspaw, Inc., Salida, Colo. All Rights Reserved.

No errors

At first L.
At second L.
About to end.

And just to make the point in the most blatant way, here's a program that runs round in a loop compiling new statements that chain onto one another, and then jumps to the first. This is why, at the beginning of this article, I showed you how to write a loop:

  i = 1

loop1
    gt( i, 10 ) :s( out1 )
    label = "L" i
    body = " output = 'In statement at " label ".'"
    goto = lt( i, 10 ) ":(L" (i + 1) ")"
    goto = eq( i, 10 ) ":(end)"
    statement = label " " body " " goto
    output = "Made this statement: " statement "."
    code( statement )
    i = i + 1
    :( loop1 )
out1
  
  output = "Going to L1."
  :( L1 )

L1
* This label will be ignored, because
* the first call to 'code' will redefine it.

end
When you write self-generating code like this, it's easy to get into trouble because you want to nest a quoted string inside a quoted string inside a quoted string. We really do need more than two kinds of string quote, and I'd like to propose the notation '1, '2, '3 ... whereby we attach index numbers to quotes. So
s = '1 t = '2 u = '3 v = '4A FINAL VALUE'4'3'2'1
would assign to s the string whose delimiting quotes are "'1", and whose contents is " t = '2 u = '3 v = '4A FINAL VALUE'4'3'2". Converting this to code and obeying it would assign to t the string whose delimiting quotes are "'2", and whose contents is " u = '3 v = '4A FINAL VALUE'4'3". And so on.

Oddly, none of the style guides I've read recently have complained that the paucity of quotes in today's languages is a problem. Their authors clearly don't take self-modifying code seriously. At any rate, this is the output from my program:

C:\dobbs>\snobol4\SNOBOL4.EXE code2.sno

Snobol+ Version 2.24. 8087 present.
(c) Copyright 1984,1998 Catspaw, Inc., Salida, Colo. All Rights Re

No errors

Made this statement: L1 output = 'In statement at L1.' :(L2).
Made this statement: L2 output = 'In statement at L2.' :(L3).
Made this statement: L3 output = 'In statement at L3.' :(L4).
Made this statement: L4 output = 'In statement at L4.' :(L5).
Made this statement: L5 output = 'In statement at L5.' :(end).
Going to L1.
In statement at L1.
In statement at L2.
In statement at L3.
In statement at L4.
In statement at L5.

To my mind, the last two programs show ample justification for the Snobol version of the foot joke. Here, from the December 1991 Developer's Insight via Susan Stepney's How to Shoot Yourself In the Foot are how some well-known languages would handle the task:

C
You shoot yourself in the foot.
C++
You accidently create a dozen instances of yourself and shoot them all in the foot. Providing emergency medical assistance is impossible since you can't tell which are bitwise copies and which are just pointing at others and saying "That's me, over there."
COBOL
USEing a COLT 45 HANDGUN, AIM gun at LEG.FOOT, THEN place ARM.HAND.FINGER on HANDGUN.TRIGGER and SQUEEZE. THEN return HANDGUN to HOLSTER. CHECK whether shoelace needs to be retied.
FORTRAN
You shoot yourself in each toe, iteratively, until you run out of toes, then you read in the next foot and repeat. If you run out of bullets, you continue anyway because you have no exception-handling facility.
Algol
You shoot yourself in the foot with a musket. The musket is aesthetically fascinating and the wound baffles the adolescent medic in the emergency room.
PL/I
You consume all available system resources, including all the offline bullets. The Data Processing & Payroll Department doubles its size, triples its budget, acquires four new mainframes and drops the original one on your foot.
BASIC
Shoot yourself in the foot with a water pistol. On big systems, continue until entire lower body is waterlogged.

And in Snobol?

Snobol (alternative)
You grab your foot with your hand, then rewrite your hand to be a bullet. The act of shooting the original foot then changes your hand/bullet into yet another foot (a left foot).

Susan notes that:

Of course, it didn't end there; there are many extensions to this idea (some included below). What many fail to recognise, however (especially those that add more complicated options for C, or reorder the list) is the meta-joke. Given the first line, the list starts off looking like yet another insult to C. But after reading the whole list, and coming back to the beginning, it becomes clear this is actually a compliment to C!

But I see it as a compliment to Snobol too. Because in which other language can you rewrite a hand to be a bullet, a hand/bullet to be a foot, or a label to be another label? Will C ever implement "code" for me?