Generating Parsers in C++ with Maphoon
Part 2 of 2
We introduce a tool (called Maphoon) for generating parsers in C++, and use it to demonstrate bottom-up parsing and tokenizing. The tool written in C++17, and creates parsers in C++17. When compiling a programming language (like for example Python or Java), the compiler starts with creating a tree representation of the input. This process is called 'parsing'.
Parsing consists of two stages: The first stage cuts the input text into atomic pieces (like numbers, identifiers, operators, and strings). This stage is called 'tokenizing'. The second stage builds the tree representation from the tokens. The resulting tree representation is usually called 'AST' (Abstract Syntax Tree).
Maphoon supports both stages of the parsing process: For tokenizing, one can use regular expressions combined with own code. The design is aimed at obtaining as much automation as possible without compromising flexibility. On the theoretical side, we use a new representation of finite automata.
For parsing, one must give the description of a formal grammar together with code fragments that will construct the AST. The parser generator reads this description and creates an executable LALR-parser. It follows the standard theory, but allows for runtime definition of operators, and finite tuning of error messages, which has traditionally been a weakness of bottom-up parsing.
Maphoon is designed to be user friendly, flexible, to support modern C++, to generate efficient parsers, and to be transparent about the underlying theory, so that it can be used in teaching. It is possible to show the parsing process and the underlying automata.
Hans de Nivelle
Hans de Nivelle has a PhD from Delft University in the Netherlands. Topic of the thesis were techniques for automated proof search in logic. From 1999 to 2007, he worked as a full time researcher at Max-Planck Institute for Computer Science in Saarbruecken, Germany. His main research topic was still automated proof search. Since this involves search, one needs efficient implementation. For this purpose, he started using C++ in 2003.
From 2007-2017, Hans de Nivelle was professor at University of Wroclaw, Poland, where he continued doing research on automated proof search but in combination with interactive proof construction. In Wroclaw, he taught formal logic, compiler construction, flight simulation, and programming in C++.
From 2018 until present, Hans de Nivelle is professor at Nazarbayev University, Kazakhstan. In the last three years, he has been teaching programming in Haskell, Prolog, Java, Python, C, and C++, as well as formal language theory and compiler construction. He is currently working on a compiler for a new programming language for logic, which is being implemented in C++. The current talk about parser generation in C++ is a result of this project. When this programming language will be finished some day, Hans de Nivelle hopes to return to logic as his main research topic and become a user of his own programming language.