The first phase parses the input, building an abstract syntax tree of
the WOM source file. This is a fairly straightforward recursive-descent
parser, but is complicated by the fact that the input can contain
arbitrary amounts of plain text between HTML or WOM elements.
Consequently, the lexical analyser must be able to distinguish WOM
elements from the rest of the text. It does this by searching for
< characters. If they are followed by a name, or a / and a
name, it assumes they indicate
the start of either an HTML or a WOM element.
It checks the name against a look-up table of WOM tags: anything not in
the table is taken to be HTML.
Thus in the example below, the elements <WOMForm>,
<TextField t> and SPM_lt;/WOMForm>; would be recognised as WOM,
and the rest as text which need not be analysed further. In principle,
the parser could analyse and error-check the HTML as well, but I haven't
bothered with this. There would have to be a way for the user to specify
which dialect of HTML was being used, given the number of variants
accepted by different browsers.
<H1>Demo</H1><WOMForm>Type input here<BR><TextField t></WOMForm>
Having isolated a WOM element, the parser then analyses its arguments.
As explained in Section 2, to conform to the syntax of HTML,
the normal way of
specifying arguments is by keyword, so that <Text id=t value="">
and <Text value="" id=t> mean the same thing. However, some
arguments can also be passed by position. The most important of these is
the instance identifier: if the first argument is not keyworded, this is
assumed to be an instance identifier. So <Text t value=""> means
the same as the last two examples.
In addition, the parser must deal with ``defaultable booleans''.
At this
level, then, the parser needs to find non-keyworded arguments, and
allocate them a keyword (if they are positionals), or the defaulted
value (if defaultable booleans). This part of the analysis is
table-driven, the tables being encoded in Lisp in a way which makes it
easy for the compiler-writer to add new WOM elements.
Having done this, the parser needs to analyse the values. To aid
error-checking, arguments are typed. For example, the size
argument to any of the fields must be an integer, so that
<TextField size=a> and <TextField size="10"> are invalid.
The latter would be legal if written as
<TextField size=10>. In the example above, the values are
almost all simple constants. This will normally be the case for
attributes keyworded by =, but not necessarily for those
keyworded by ::, as explained in
Section 2.22. Thus this stage of parsing
requires analysis and type-checking of general expressions.