Hari's CornerHumour, comics, tech, law, software, reviews, essays, articles and HOWTOs intermingled with random philosophy now and then
Writing a Toy Calculator scripting language with Java and ANTLR 4 - Part 1
Tutorials and HOWTOs by
Posted on Sat, Apr 4, 2020 at 22:16 IST (last updated: Sun, Apr 5, 2020 @ 12:24 IST)
|In this series||Next >|
I have always been interested in pattern matching and parsing, and recently have re-kindled my interest in Java. So it is only natural that I came across ANTLR. ANTLR is a parser-generator that works with multiple languages and originally written to generate Java code. It certainly caught my interest, since it not only combines a lexer with a parser generator, it also builds a parse-tree object, ready for you to feed into further processing / compiling / code generation. You can certainly using the lex/yacc, flex/bison combination, but ANTLR certainly seems much more convenient. For the theoritically minded, ANTLR generates LL(k) parsers.
In this article, I will discuss how to set up ANTLR with NetBeans and Maven and how to build a simple toy calculator language with this wonderful tool. While pattern matching and parsing are complex subjects, ANTLR certainly does a lot of the heavy lifting and helps you focus purely on the grammar of the language you are designing and using the results of parsing that grammar.
Why this article (series)
This article is basically to document my own effort at learning ANTLR and also give an idea to others as to how to build a basic simple "language" with this tool. I found many tutorials / Stack Overflow answers etc, both advanced and simpe, and have tried to distill their essence into this.
Defining our scope for the First Cut
The grammar I am about to design is very simple, yet involved enough to get an idea of what it is about, and it should be easy enough to extend. Since this is as much a learning experience for me as for the reader, I appreciate corrections/comments if I have made any mistakes or errors.
I assume that you have a basic knowledge of pattern matching (or regular expressions) and a working knowledge of Java.
Before going into the technicalities, it is good to define our scope. Most of the tutorials I read dived straight into the code, which makes it somewhat hard to understand what the author is trying to achieve. Avoiding that temptation, before writing a single line of code/grammar construct, I wanted to explain the scope which makes both the features and limitations of our end result clear. So, here is what I intend to achieve in the first cut of our grammar of ToyCalc:
- It is a basic calculator which stores exactly one value (float/double) in memory. All operations are carried out on that value. The following operations would be supported: Setting (or resetting) the value, and basic operations like adding, subtracting, multiplying, dividing. If no value is set initially, our calculator will assume 0.
- Our calculator language will support statements to print any textual messages (very basic string without escape sequences) also a statement to display the current value.
- Our calculator will execute all statements from top to bottom immediately and unconditionally. Hence in our first cut, we will not build any "Syntax Tree" or construct. This is so that I can concentrate on designing the "grammar" part.
- What we don't have: conditionals, loops, variables etc.
- Our program will consist of one or more statements.
- Each statement will be either of the following, ie:
SETVALUE <num>- for setting or resetting the current value.
GETVALUE- print the current value.
ADD <num>- add <num> to the current value and store it to the current value.
SUB <num>- similarly subtract from current value.
MUL <num>- multiply
DIV <num>- divide
PRINT <string>- print a simple text message.
- Comments will start with
/*and end with
*/and everything inside the comments will be ignored.
- Number can be either positive or negative, integer or a decimal. Number will not support octal, hexadecimal or e-notation representation.
- All whitespaces except the significant whitespace separating the above tokens will be ignored, except within the string literal between
". We can also choose a different string delimiter character. But our string literal will not support any escape sequences.
- Each statement will be terminated by a semicolon.
- For simplicity sake, above keywords will be case-sensitive i.e you cannot use
setvaluein the place of
SETVALUE. Or you can make it lower case if you wish - it is a matter of taste. Case insensitivity for keywords will be introduced in the second cut as an option.
A sample program (of our first cut ToyCalc) will look like this:
/* A sample ToyCalc Script
hello world. */
SETVALUE -110;/* Note that since we
use statement separator ; we can have
multiple statements in one line */MUL 5.5; ADD 10; DIV 3;
PRINT "The current value is: ";
PRINT "The final value is: ";
So without further, ado, let us dive into the requirements.
- NetBeans - which can be downloaded from here. I am using Netbeans 11.0 LTS.
- a Java JDK - OpenJDK if you are using Linux, you can probably use your distro's package management tool to get it. On Windows you can just get the latest Oracle JDK from the Java website here and install it (or if you are worried about Oracle's licensing, you can still use OpenJDK on Windows).
That's about it. Other dependencies can be pulled in using Maven directly in Netbeans to build your project.
- Fire up Netbeans.
- Under the
Toolsmenu, go to
Java Platforms. Add the correct path to your JDK.
File -> New Project...
- Under the category,
Java with Maven, choose
- I named my Application
ToyCalcgiving the group ID
org.harishankar. You can give whatever name you want. NetBeans will automatically create the package
org.harishankar.toycalcfor this application.
You should now have an empty Maven project.
Will be continued in Part 2.
In this series
- Writing a Toy Calculator scripting language with Java and ANTLR 4 - Part 4
- Writing a Toy Calculator scripting language with Java and ANTLR 4 - Part 3
- Writing a Toy Calculator scripting language with Java and ANTLR 4 - Part 2
- Writing a Toy Calculator scripting language with Java and ANTLR 4 - Part 1