next up previous
Next: The code-generator
Up: The compiler
Previous: The compiler

The parser

The first phase parses the input, building an abstract syntax tree of the WOM source file. This is a fairly straightforward recursive-descent parser, but is complicated by the fact that the input can contain arbitrary amounts of plain text between HTML or WOM elements. Consequently, the lexical analyser must be able to distinguish WOM elements from the rest of the text. It does this by searching for < characters. If they are followed by a name, or a / and a name, it assumes they indicate the start of either an HTML or a WOM element. It checks the name against a look-up table of WOM tags: anything not in the table is taken to be HTML.

Thus in the example below, the elements <WOMForm>, <TextField t> and SPM_lt;/WOMForm>; would be recognised as WOM, and the rest as text which need not be analysed further. In principle, the parser could analyse and error-check the HTML as well, but I haven't bothered with this. There would have to be a way for the user to specify which dialect of HTML was being used, given the number of variants accepted by different browsers.

<H1>Demo</H1><WOMForm>Type input here<BR><TextField t></WOMForm>

Having isolated a WOM element, the parser then analyses its arguments. As explained in Section 2, to conform to the syntax of HTML, the normal way of specifying arguments is by keyword, so that <Text id=t value=""> and <Text value="" id=t> mean the same thing. However, some arguments can also be passed by position. The most important of these is the instance identifier: if the first argument is not keyworded, this is assumed to be an instance identifier. So <Text t value=""> means the same as the last two examples. In addition, the parser must deal with ``defaultable booleans''. At this level, then, the parser needs to find non-keyworded arguments, and allocate them a keyword (if they are positionals), or the defaulted value (if defaultable booleans). This part of the analysis is table-driven, the tables being encoded in Lisp in a way which makes it easy for the compiler-writer to add new WOM elements.

Having done this, the parser needs to analyse the values. To aid error-checking, arguments are typed. For example, the size argument to any of the fields must be an integer, so that <TextField size=a> and <TextField size="10"> are invalid. The latter would be legal if written as <TextField size=10>. In the example above, the values are almost all simple constants. This will normally be the case for attributes keyworded by =, but not necessarily for those keyworded by ::, as explained in Section 2.22. Thus this stage of parsing requires analysis and type-checking of general expressions.


next up previous
Next: The code-generator
Up: The compiler
Previous: The compiler



Jocelyn Ireson-Paine
Fri May 30 14:03:06 BST 1997