Hari's Corner

Humour, comics, tech, law, software, reviews, essays, articles and HOWTOs intermingled with random philosophy now and then

Writing a Toy Calculator scripting language with Java and ANTLR 4 - Part 4

Filed under: Tutorials and HOWTOs by Hari
Posted on Thu, Apr 9, 2020 at 11:33 IST (last updated: Thu, Apr 9, 2020 @ 17:51 IST)

In this series < Previous

In the previous parts, we have seen how to conceptualize our Toy Calculator language and also build our grammar rules in ANTLR 4. To recall, we defined the parsing rules in a top down manner, starting from the conceptual level of a what is a "toycalc" program (a series of statements) and what is a "statement" (can be either of an "operation" on the calculator current value, displaying the current value, or displaying some text). An operation can be one of setting the current value, adding to, subracting from, multiplying with or dividing the current value with some other value.

A brief overview of lexing and parsing

From our grammar, ANTLR will generate code for both lexing and parsing. At the stage of lexing, a stream of characters are analyzed and split into tokens (defined as rules starting with upper case characters in our grammar file). Though the lexing is the initial stage, our lexing rules conventionally are placed below the parsing rules, or optionally lexing rules can be kept in a separate grammar.

The parser then takes a stream of tokens from the lexer and tries to match the grammar, constructing a parse tree in the process.

An important behaviour of ANTLR is that when the ANTLR parser encounters an error in parsing (i.e. comes across tokens not matching any rule) it stops parsing. However, if the tokenizer itself cannot spit out any recognized tokens from the input characters, it continues to tokenize, spitting out a kind of warning for the unrecognized tokens, but does not halt and continues to try and tokenize the remaining input. To change this kind of behaviour, we can either implement an "ERROR" token rule (which will match any character and emit a token, but this rule will come at the end of all other rules). When we define such an error token (as matching any character) this token is fed into the parser, but the parser cannot find any rule to match the token and thus fails. For fine grained error handling, we can override ANTLR's error handling mechanism (but this is beyond the scope of our current project).

Keeping in mind the above, without further ado, let us build the project. Choose the menu Run and click on Clean and Build Project (ToyCalc)

If all goes well, this should generate a series of files of ANTLR 4 in "generated sources". The following files should have been created:

Generated Sources (antlr4)  
+---<default package>
|------ToyCalc.tokens
|------ToyCalcLexer.tokens
+---org.harishankar.ToyCalc
|------ToyCalc.interp
|------ToyCalcBaseListener.java
|------ToyCalcLexer.interp
|------ToyCalcLexer.java
|------ToyCalcListener.java
|------ToyCalcParser.java

What is of most interest to us is the ToyCalcBaseListener class (which implements the ToyCalcListener interface). By default, ANTLR 4 generates a "listener" for the parse tree generated by it. What this listener does is to emit certain actions when either entering or exiting a parsing rule. The base listener class does nothing by default. By extending this base listener class, we can code the actions to perform upon entering each rule or exiting each rule. This makes it very easy to implement actions for each parsing rule. The action can be building a node for generating an AST (Abstract Syntax Tree) or it can be some other computation. For our first cut of ToyCalc, we will not go into building any AST since as per our specification (recall part 1 of this article), each statement is executed immediately and unconditionally. For non-trivial programming languages which involves even basic constructs like looping (for or while loops for example), conditional branching (if - then - else or switch statements) etc. an AST is a must, because the parse listener by itself cannot analyze anything other than the parsing rule it encounters. The parser's job is then restricted to constructing an AST from an input source, which can be used as a basis for further processing like generating code etc.

Extending the base listener

Let us extend the base listener to define our customized actions to make our Toy Calculator work. In NetBeans, create a new Java class by right clicking on our namespace, org.harishankar.toycalc inside the Projects tab and clicking New -> Java Class.... Name the class ToyCalcExtendedListener. Make it extend ToyCalcBaseListener.  The code should look like this:

public class ToyCalcExtendedListener extends ToyCalcBaseListener {

}

Let us now extend the base listener by overriding the listener methods. The ToyCalcBaseListenerclass inside the generated sources will look like this (stripped of the imports and JavaDoc comments):

public class ToyCalcBaseListener implements ToyCalcListener {
	@Override public void enterToycalc(ToyCalcParser.ToycalcContext ctx) { }
	@Override public void exitToycalc(ToyCalcParser.ToycalcContext ctx) { }
	@Override public void enterStatement(ToyCalcParser.StatementContext ctx) { }
	@Override public void exitStatement(ToyCalcParser.StatementContext ctx) { }
	@Override public void enterEveryRule(ParserRuleContext ctx) { }
	@Override public void exitEveryRule(ParserRuleContext ctx) { }
	@Override public void visitTerminal(TerminalNode node) { }
	@Override public void visitErrorNode(ErrorNode node) { }
}

Note that, enter_ methods are called when the parser enters into the matching rule and exit_ methods are called when the parser exits the matching rule (i.e. after parsing the rule).

First, for our toy calculator, we need to store the current value of the calculator across the life cycle of the parse. For this we will define a private member called calculator_val of the type double inside our extended listener. Also when any Toy Calculator program is initialized, it should be 0. Let us override the enterToycalc method, since it is the entry-point rule of our parser. We will also display a goodbye message to the user when the program is completed.  Our class should now look like this:

public class ToyCalcExtendedListener extends ToyCalcBaseListener {
            private double calculator_val;
            @Override
            public void enterToycalc(ToyCalcParser.ToycalcContext ctx) {
                calculator_val = 0;
            }
           @Override
            public void exitToycalc(ToyCalcParser.ToycalcContext ctx) {
                System.out.println ("Goodbye!");
            }
}

Note that the context object ctx is of the type ToyCalcParser.ToycalcContext which holds the parse information for this particular rule. Next let us implement the statement rule, which as per our grammar is defined as:

statement   : (OPERATION EXPR | PRINT STRING | GETVALUE);

Correspondingly, let us override the exitStatement rule to implement its functionality (we will perform our action after the rule is parsed). First, we will implement the PRINT functionality, which should simply output a string enclosed in double quotes. Inside the ToyCalcExtendedListener class, override the exitStatement method.

@Override
public void exitStatement(ToyCalcParser.StatementContext ctx) {
    // If it is the print statement then get the token "string" and print it
    if (ctx.PRINT() != null )
        System.out.print (ctx.STRING().toString().replace("\"", ""));
}

The above simply checks if the context has a PRINT token and if it does, it prints the string token STRING. Note that the Java String.replace function is called because we want to remove the extraneous enclosing double quotes (the lexer returns the matched token as is).

Now that we have an action inside our parser, let us check if it works. For this, we need to create a Main class, and implement our parser by feeding it some input. Create a new class called Main in your package folder and add the following code.

public final class Main {
    public static void main (String [] args) {
        // Define a string to parse
       String myProgram = "PRINT \"Hello World\";";
        // Create a charstream from the string
        CharStream prog = CharStreams.fromString (myProgram);
        // Pass it through the lexer and create a token stream
        ToyCalcLexer lex = new ToyCalcLexer(prog);
        CommonTokenStream stream = new CommonTokenStream(lex);
        // Create a parser and pass the token stream to the parser 
        ToyCalcParser par = new ToyCalcParser (stream);
        // Create and add a new instance of our parse listener 
        // to the parser
        par.addParseListener(new ToyCalcExtendedListener());
        // finally invoke the entry point of the parser
        par.toycalc();
    }
}

(I am not reproducing the import statements. NetBeans automatically prompts requirement to import when you use a class from another package.)

As you can see, the process involves getting some input in the form of a character stream into the lexer, then creating a token stream from the lexer and passing it through the parser. Finally we add our custom parse listener to the parser before invoking the entry rule of our parser. In the above program, for testing purpose, we used a String to define our ToyCalc program, but normally we can read the input from a file using the standard Java IO classes.

Now build and run the program. In our output window in NetBeans, you should see this output.

...
...
------------------------------------------------------------------------
Building ToyCalc 1.0-SNAPSHOT
------------------------------------------------------------------------

--- exec-maven-plugin:1.5.0:exec (default-cli) @ ToyCalc ---
Hello WorldGoodbye!
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 1.088 s
...

Great! Our program has printed the string Hello World and also the Goodbye!  message, which means that our program has correctly parsed the statement and displayed the string and also exited the parser. Play around by adding some more PRINT messages (separated by a semicolon) to the input string and see if it works correctly. Since we have not used a newline character or used System.out.println everything will be displayed on one line. We can trivally implement a PRINTLN statement to the grammar and make that display a string followed by a newline. I leave that as an exercise for the reader.

In the next part we will implement the remaining bits of our Toy calculator language. We will also make our program take a file as an input (as specified in the command line) rather than a hard-coded string.

In this series

No comments yet

There are no comments for this article yet.

Comments closed

The blog owner has closed further commenting on this entry.