Ordeal::Model::Parser: parsing


Where we take a look at the domain parsing function in Ordeal::Model::Parser.

As before, we will refer to the code for version 0.003. We will not get into the details for everything, just take a look at some of the functions - the other ones are implemented pretty much in the same way.

Behind-the-scenes entry point

The de-facto entry point is _expression:

 1 sub _expression {
 2    state $r = sub ($rtext) {
 3       state $addend = _addend();
 4       state $seq = __sequencer(
 5          $addend,
 6          __starer(__sequencer(__regexper(qr{([-+])}), $addend)),
 7       );
 8       state $name_for = {'+' => 'sum', '-' => 'subtract'};
 9       my $match = $seq->($rtext) or return;
10       my ($retval, $transformations) = $match->@*;
11       for my $t ($transformations->@*) {
12          my ($op, $addend) = $t->@*;
13          $retval = [$name_for->{"$op->@*"}, $retval, $addend];
14       }
15       return $retval;
16    };
17 }

As most (if not all!) functions in this parser, this factory is implemented to always return the same function, by memoizing the parsing workhorse as a state variable.

According to the grammar, an expression is:

expression: addend ( addop addend )*
addop: '-' | '+'

and lines 3 to 7 implement pretty much this. For simplicity, the addop is implemented as a regular expression to get both characters with a single parsing function.

Lines 8 to 15 deal with the building of the Abstract Syntax Tree. Both operations are supposed to be associative on the left, so we start from the left-most element (that is the $addend in line 5) and for each item collected in the * sequence we build a new node based on the result of the previous iteration, the specific operation in that item and the associated addend. Operations names are changed on the fly to use a more verbose version.

In other words, a sequance like this

A - B + C - D

is turned into this data structure (sort of, the terminals are shown as play strings but are something different actually):


You can see how easy is to evaluate this: for each level, get calculate the two operands (possibly in a recursive way), then apply the specific operation.

The addend

From the grammar:

addend: ( positive_int multop )* atom ( multop positive_int )*
multop: '*' | 'x'

The implementation is pretty much it:

 1 sub _addend {
 2    state $r = sub ($rtext) {
 3       state $op = __regexper(qr{([*x])});
 4       state $seq = __sequencer(
 5          __starer(__sequencer(_positive_int(), $op)),
 6          _atom(),
 7          __starer(__sequencer($op, _positive_int())),
 8       );
 9       my $match = $seq->($rtext) or return;
10       my ($pre, $retval, $post) = $match->@*;
11       $retval = ___mult($retval, reverse($_->@*)) for reverse($pre->@*);
12       $retval = ___mult($retval,        ($_->@*)) for        ($post->@*);
13       return $retval;
14    }
15 }

The main workhorse is a sequence of a star expression (line 5), an atom (line 6), and another star expression (line 7). After the match (line 9) we find the Abstract Syntax Tree building.

Note that operations are applied from the atom outwards, with preference for the prefixes. Going outwards means that elements before the atom enjoy a reverse. The ___mult is a helper that builds a node for the AST.

The atom

From the grammar we have:

atom: ( identifier | sub_expression ) unaryop* 
sub_expression: '(' expression ')'
unaryop: shuffler | simple_slicer | slicer | sorter

In the implementation, for a bit of added clarity it has been implemented like follows:

atom: atom_base atom_unary* 
atom_base: identifier | sub_expression
sub_expression: '(' expression ')'
atom_unary: shuffler | simple_slicer | slicer | sorter

In code:

 1 sub _atom {
 2    state $base = _atom_base();
 3    state $unaries = __starer(_atom_unary());
 4    state $retval = sub ($rtext) {
 5       my $retval = $base->($rtext) or return;
 6       for my $unary ($unaries->($rtext)->@*) {
 7          my ($op, @rest) = $unary->@*;
 8          $retval = [$op, $retval, @rest];
 9       }
10       return $retval;
11    };
12 }

Lines 2 and 3 build the two pieces for atom_base and atom_unary in the grammar and assemble them in a function (lines 4 to 11) that does the matching attempt (line 5) and, when successful, builds the Abstract Syntax Tree section for it. The logic is pretty much the same we saw before in expression, only with different pieces that have a higher priority of evaluation (the unary operators).

The code for atom_base completes the thing:

 1 sub _atom_base {
 2    state $sub_expression = sub ($rtext) {
 3       state $seq = __sequencer('(', _expression(), ')');
 4       my $match = $seq->($rtext) or return;
 5       return $match->[1];
 6    };
 7    state $retval = __alternator(
 8       _identifier(),
 9       $sub_expression,
10    );
11 }

Again, pretty much a literal translation, with a sub-expression (line 2 to 6) and an identifier that are combined with an alternation operation (lines 7 through 10).

Just for curiosity, let’s take a look at one of the unary operators:

sub _shuffler { state $r = __exact('@', 'shuffle') }

It’s just an exact match for the character @, which gets eventually returned as the string shuffle. Nothing terribly exciting.

This is about it

As anticipated, we will not go through each and every function, the examples above should provide a precise idea of the philosophy of the module:

  • every parsing bit is a reference to a sub that complies with a specific interface (text input as reference to a scalar, output an anonymous array or undef)
  • generic functions are leveraged to combine bits according to the usual rules for a grammar (e.g. *, |, etc)
  • the grammar is translated into code first assembling the different parsers, then operating on the return value to give the resulting Abstract Syntax Tree the right shape, so that it can be easily used for evaluation at a later stage.

At this point… it’s all folks!

The whole series

Want to look at the other parts of this series? Here’s a list of them:

Comments? Octodon, , GitHub, Reddit, or drop me a line!