Writing a LaTeX algebraic expression parser in C++ (Part 1)

4 min readDec 11, 2020

Earlier this year I came across an expression parser (feq-parse) from Fluid Numerics written in OO-Fortran. Dr. Schoonover, the code’s author, had implemented the parser in a classic binary tree-like fashion. An infix algebraic expression is sieved character-by-character, pushing each respective character onto a stack of tokens, which is then converted into a token stack of postfix form (Reverse Polish Notation) to be evaluated. As a non-Computer Science major I have never worked with nor seen such a parser, and thought it to be a worthwhile personal project to implement in modern C++.

I began my journey by researching various implementations of binary expression trees, many of which were implemented in C in a similar manner as described above. The greatest treasures among this wealth of information were not the provided code, rather the pretty graphs depicting these trees. Having visuals handy to rationalize the necessary logic was immensely helpful, and rest assured that I had scrawled countless binary expression trees on paper to concretize my approach.

I started by writing a polymorphic “Node” class that is used to represent real numbers and the five primary arithmetic operator classes of “Addition” (+), “Subtraction” (-), “Multiplication” (*), “Division” (/), and “Exponentiation” (^). The “Node” base class implements a virtual “Value” method that, by default, returns the stored real number. The derived operator classes override this “Value” method and return the result of their respective binary operation upon two other “Node” objects’ “Value” methods. This is advantageous as these nodes could be organized in a tree-like fashion, cascading to a final desired answer from a single call to the “Value” method of the base node.

For example, should I wish to calculate “1+2*3” I would implement the binary expression tree as depicted below.

In this specific case, our base node would be of type “Addition” whose “Value” method yields the sum of the “Value” of 1 and the “Value” of the “Multiplication” node consisting of the “Value” of 2 and the “Value” of 3. That is to say that the product of 2 and 3 is calculated first in accordance with the order of operations then the sum is calculated.

The code for the “Node” base class, the “Multiplication” derived class, and the “Addition” derived class are simply as shown below.

The program below implements this tree and yields the output “1+2*3 = 7” as expected.

With the basics aside, now comes the more tedious task of writing the logic of the parser. To achieve this goal, I make use of std::regex to parse strings in a modern fashion. Before I proceed with actually parsing the expression, I first remove all spaces and ascertain that the string is not empty.

And now we parse! Our unescaped regular expression for assessing the left-hand-side (LHS) and right-hand-side (RHS) of any given binary operation appears as follows:

^(.*)([\+\-])(.*)$|^(.*)([\*\/])(.*)$|^(.*?)([\^])(.*)$

Unassuming of the reader’s knowledge of regular expressions, the ‘^’ character indicates the start of a line, the ‘$’ character indicates the end of a line, the ‘|’ character acts as a boolean OR, the ‘.*’ characters match any character as many times as possible, and the ‘.*?’ characters match any character as few times as possible. This regular expression contains a total of 3 alternative expressions, the first matching ‘+’ and ‘-’, the second matching ‘*’ and ‘/’, and the third matching ‘^’. The reason for using ‘.*?’ in contrast to using ‘.*’ for the ‘^’ alternative is because exponentiation has right-to-left associativity, while ‘/’ and ‘-’ have left-to-right associativity (this is a non-issue for ‘*’ and ‘+’ due to the associative property). This ascertains that we assess our operations in accordance with their respective rules of association.

Now that we are able to separate our data, we can recursively parse each side to extract the numerals to be used in our calculation and construct a binary expression tree, assigning LHS and RHS nodes as we traverse our way down the tree. With this, we end up with the code shown below.

And like that, we have the beginnings of our parser! To test I used the program below which greatly simplifies our problem.

This, however, is merely the tip of the iceberg. Many greater tasks lay ahead, such as named variable implementation, tree-to-string composition, simplification procedures, solving equations for specified variables, and much more. These features are a work-in-progress, and all contributions are appreciated. Further blog posts will be made detailing the workings of these features.

Please visit our project here:

https://github.com/CaseySanchez/ExpressionParser

Special thanks to Dr. Schoonover, please visit him at these locations:

https://www.fluidnumerics.com

https://github.com/FluidNumerics

Writing a LaTeX algebraic expression parser in C++ (Part 1)

Written by Casey Sanchez

Responses (2)