Trying Marpa


I’ve finally come to try out Marpa. To blow a mosquito with a cannon.

It’s not mystery that I usually revert to the Parsing toolkit in cglib when I have to build a parser. I mean, I wrote it and I have to give it some sense.

So far, this has prevented me doing two things:

  • learn more on what regular expressions have become in later years, and
  • learn something about Marpa (well… release 2 at least).

This is usually related to the fact that I don’t need to implement fancy grammars, hence coding a parser at a lower level is not a real issue.

The occasion, though, came with quiz #2 in the 19th day of Advent of Code. While the 1st part could be easily addressed with something definitively simpler, the same solution failed miserably with the alternative input grammar.

At this point, instead of fiddling with my original code, I decided to climb the giant’s back and stand on its shoulders. From there, there is a good view of the mosquito, so it’s impossible to miss it with our Marpa cannon!

Here’s the code that did the trick for me:

#!/usr/bin/env perl
use 5.024;
use warnings;
use autodie;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;
use File::Basename qw< basename >;

use Marpa::R2;

my @rules;
my $filename = shift || basename(__FILE__) =~ s{\.pl\z}{.input}rmxs;
open my $fh, '<', $filename;
while (<$fh>) {
   last unless m{\S}mxs;
   my ($id) = m{\A (\d+):}mxs;

   s{:}{ ::=}gmxs;

   $rules[$id] = $_;

my $dsl = join "\n",
   ':default ::= action => [name,values]',
   'lexeme default = latm => 1',

my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl } );
say scalar grep { eval { s{\s+}{}gmxs; $grammar->parse(\$_) } } <$fh>;

The first part reads all rules in, and stores the in @rules in ascending order. This will also allow putting rule 0 at the beginning and make it the default rule.

While reading input rules in, we also transform them on the fly in a format that will be good for Marpa::R2. In particular, we prepend the string Rule to every rule, transform plain : colon character in the so-called BNF operator ::= and turn double quotes in single ones. In other terms, just a few adaptations on the format.

In practice, this:

0: 1 2
1: "a"
2: 1 3 | 3 1
3: "b"

is turned into this:

Rule0 ::= Rule1 Rule2
Rule1 ::= 'a'
Rule2 ::= Rule1 Rule3 | Rule3 Rule1
Rule3 ::= 'b'

Variable $dsl eventually holds our whole grammar, with the addition of a pair of initial configurations taken straight from the SYNOPSIS:

:default ::= action => [name,values]
lexeme default = latm => 1
Rule0 ::= Rule1 Rule2
Rule1 ::= 'a'
Rule2 ::= Rule1 Rule3 | Rule3 Rule1
Rule3 ::= 'b'

Building the grammar text was really the hard part, because Marpa can do the rest from now on. We get an object $grammar that can parse valid inputs, then use it to filter the several input strings. The eval in the grep is necessary because inputs that fail to be matched will result in an exception. so we need to catch it. Additionally, grep in this case is very useful to count things, which is what we are requested to do.

This solution is completely general, so it can be applied both to the original input and to the modified one. I’ve found it easier to have two input files, one for each step in the day 19 challenge. You might want to do differently!

So… thanks to Jeffrey Kegler for saving my day 🙄

Comments? Octodon, , GitHub, Reddit, or drop me a line!