[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

How to Generate a Tree-Building Parser in Java using JJTree

JavaCC is a parser-generator for Java. JJTree augments JavaCC: it makes the generated parser build a syntax tree, without you having to program the tree-building yourself. I once wrote an Introduction to JJTree; and I've just had an email requesting a JJTree example with which a complete beginner could get started. So I decided to blog one. It shows you how to generate a tree-building parser for a language of simple arithmetic expressions.

I'll start right at the beginning by showing you how to get JavaCC and JJTree. They have moved around the Web quite a lot over the past few years, and are now at java.net. I downloaded them as a zip file from javacc.dev.java.net, by clicking on the javacc-4.2.zip link under the heading "JavaCC 4.2 Released!" near the top of the page. I then unzipped this into c:\javacc4.2 on my laptop. (I'm running Windows XP.)

There are several subdirectories, containing documentation, examples, and batch files for running both programs. I'll demonstrate how to use these shortly, but first, I need a parser.

Looking around the Web, I found a nice example at Lund University's Department of Computer Science. There are several copies: the one I used is at Calc parsing example in JavaCC and JJTree. From the name at the bottom of the page, it may have been written by Görel Hedin.

This example comes in two files. One is the parser itself. The other is a class whose "main" method creates an instance of this parser, makes it read from System.in, and invokes it to parse input according to one of the grammar rules, "Start". It then displays the parse tree it's built from this input, which is why it's in a class called "CalcDumpTree".

The parser is for a language of simple arithmetic expressions. Here are some examples. The input shows what I typed into the parser, and the parse tree is what the parser displayed:
InputParse treeMeaning
1.2 Start
 Exp
  Factor
   FPLitExp
1.2 is an expression which is a factor which is a floating point literal.
X Start
 Exp
  Factor
   IDExp
"X" is an expression which is a factor which is an identifier.
1.2 * X Start
 Exp
  Factor
   FPLitExp
  MulExp
   Exp
    Factor
     IDExp
"1.2 * X" is an expression which is a factor (which is an identifier) followed by a multiplicative expression (which is an expression which is a factor which is an identifier).

Now I'll show you the parser code. I got it from the calc.jjt link near the top of Calc parsing example in JavaCC and JJTree. I put it into a file called c:\dobbs\CalcParser.jjt, which you can get here. Before doing so, I made three small edits. One edit just inserted the name of my file. The second completed the example in the comment: the original was missing two "ni" keywords. And the third inserted a "package dobbs;" statement after the line "PARSER_BEGIN(CalcParser)", to be consistent with the directory I am working in. So here's the code:

/* CalcParser.jjt */


/* This example shows how to specify a simple parser for a toy calculator
   language with floats, multiplication, and let-expressions.
   Here is an example program in this language:
   
      let radius = 5.0 in
        let pi = 3.14 in
          2.0 * pi * radius
        ni
      ni
*/


/* *** Specifcation of the parser class *** */

PARSER_BEGIN(CalcParser)

package dobbs;

public class CalcParser {}

PARSER_END(CalcParser)

/* *** Token specification *** */

/* Skip whitespace */
SKIP : { " " | "\t" | "\n" | "\r" }

/* Reserved words */
TOKEN [IGNORE_CASE]: {
  < LET: "let">
| < IN: "in">
| < NI: "ni">
}

/* Literals */
TOKEN: {
  < FPLIT: (["0"-"9"])+ "." (["0"-"9"])+ > // Floating point literal
}

/* Operators */
TOKEN: {
  < ASSIGN: "=" >
| < MUL: "*" >
}

/* Identifiers */
TOKEN: {
  < ID: (["A"-"Z", "a"-"z"])+ >
}

/* *** Context-free grammar (EBNF) *** */

/* Each nonterminal is written like a kind of method, containing all its
   productions. JavaCC will generate a parsing method for each nonterminal.

   Note: In the start nonterminal, the action "return jjtThis;" instructs
   JavaCC to return the resulting parse tree from the generated parsing
   method. Therefore the start nonterminal has a result type (SimpleNode).
   All other nonterminals have no result type (void).
*/

/* The start nonterminal and its productions. */

SimpleNode Start() : {} // Start -> Exp
{
  Exp()
  { return jjtThis; }
}

/* Other nonterminals and their productions */

void Exp() : {} // Exp -> Factor [MulExp]
{
  Factor() [ MulExp() ]
}

void Factor() : {} // Factor -> LetExp | FPLitExp | IDExp
{
  LetExp()
| FPLitExp()
| IDExp()
}

void LetExp() : {} // LetExp -> "let" IDExp "=" Exp "in" Exp "ni"
{
  <LET> IDExp() "=" Exp() <IN> Exp() <NI>
}

void MulExp() : {} // MulExp -> "*" Exp
{
  "*" Exp()
}

void FPLitExp() : {} // FPLitExp -> FPLIT
{
  <FPLIT>
}

void IDExp() : {} // IDExp -> ID
{
  <ID>
}

The purpose of this posting isn't to explain JJTree and JavaCC's notation — there's documentation to do that — but just to provide a complete and ready-to-run example. So I shan't explain exactly how the code above means what it means. But you will notice that the file contains a number of definitions, one under the heading "The start nonterminal and its productions", and the others headed "Other nonterminals and their productions". By each of these, the author has written a comment such as

// Factor -> LetExp | FPLitExp | IDExp
These comments specify the corresponding grammar rule using Extended BNF, and you can see how closely they match their JJTree code.

Now here's the class that calls this parser. It, I got from the CalcDumpTree.java link near the top of Calc parsing example in JavaCC and JJTree. I put it into c:\dobbs\CalcDumpTree.java — on my site here — and edited it by inserting its filename on the first line and by inserting a "package dobbs;" statement:

/* CalcDumpTree.java */


/*
  This program parses Calc programs and prints the resulting parse tree.
  Calc programs should be entered on standard input.
  The resulting parse tree is printed on standard output.
*/


package dobbs;


class CalcDumpTree {

  public static void main(String args[]) {
  
    /*
      Create a parser which reads from standard input
    */
    CalcParser parser = new CalcParser(System.in);
    
    try {
    
      /*
        Start parsing from the nonterminal "Start".
      */
      SimpleNode parseTree = parser.Start();
      
      /*
        If parsing completed without exceptions, print the resulting
        parse tree on standard output.
      */
      parseTree.dump("");
      
    } catch (Exception e) {
    
      /*
        An exception occurred during parsing.
        Print the error message on standard output.
      */
      System.out.println("---------------------------");
      System.out.println("Sorry, couldn't parse that.");
      System.out.println(e.getMessage());
      System.out.println("---------------------------");
      
      /*
        Print the call stack
      */
      System.out.println("Call stack:");
      e.printStackTrace();
      System.out.println("---------------------------");
    }
  }
  
}

Now I'll show you how I converted the parser into Java. I've already said that the JavaCC distribution provides batch files with which to run it and JJTree. Because I unzipped it into c:\javacc4.2, my copies of these batch files are in c:\javacc4.2\bin\ . I need to use two of them.

First, from c:\dobbs\ , I give this command:

c:\javacc4.2\bin\jjtree CalcParser.jjt
This calls JJTree, which generates some auxiliary Java files to represent parts of the parse tree, and also generates a file CalcParser.jj . JJTree reported its progress with these messages:
Java Compiler Compiler Version 4.2 (Tree Builder)
(type "jjtree" with no arguments for help)
Reading from file CalcParser.jjt . . .
File "Node.java" does not exist. Will create one.
File "SimpleNode.java" does not exist. Will create one.
File "CalcParserTreeConstants.java" does not exist. Will create one
File "JJTCalcParserState.java" does not exist. Will create one.
Annotated grammar generated successfully in .\CalcParser.jj

If you now look at CalcParser.jj, you will see that it contains grammar rules similar to CalcParser.jjt's, but with a lot of added code. This code is Java semantic actions — that is, commands to be executed as the parser runs. Parsers typically use semantic actions to check their input beyond what the grammar itself can do, and also to build a representation of the input for use once parsing has finished. A parse tree is such a representation, and if you didn't have JJTree, you'd have to code your own semantic actions to build one. But luckily, you do have JJTree.

I now need to convert CalcParser.jj into Java. To do so, I type this command:

c:\javacc4.2\bin\javacc CalcParser.jj
And JavaCC reports its progress much as did JJTree, telling me the parser-support files it's had to create:
Java Compiler Compiler Version 4.2 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file CalcParser.jj . . .
File "TokenMgrError.java" does not exist. Will create one.
File "ParseException.java" does not exist. Will create one.
File "Token.java" does not exist. Will create one.
File "SimpleCharStream.java" does not exist. Will create one.
Parser generated successfully.

Next, I must compile the parser. So I give this command:

javac CalcParser.java
I now have a CalcParser.class file.

And I must compile the class that calls the parser, which I do thus:

javac CalcDumpTree.java

And finally, I can run this. So I give the command:

java dobbs.CalcDumpTree
This program reads from the standard input, so I have to type an expression to be parsed. I type 1.2, then hit Enter, and then type Control-z for end of file, and then hit Enter again. And CalcDumpTree outputs this, which is the parse tree I showed in my table:
Start
 Exp
  Factor
   FPLitExp

By the way, it seems necessary for the Control-z to be on a line of its own, with nothing before it on the same line. Otherwise, the parser gives a "TokenManagerError", complaining about an unexpected character code 26. I suspect this is an oddity in Windows, not always recognising Control-z as end of file.

Now you can try running the parser again, but with some other input. For example, why not try the example from the top of CalcParser.jjt:

let radius = 5.0 in
  let pi = 3.14 in
    2.0 * pi * radius
  ni
ni
That will give you a more complicated parse tree.