Romeo - slice - ETOOBUSY

TL;DR

Romeo now has a couple of slicing features.

I sometimes have to manipulate a series of records from a database, usually coming as a JSON-encoded array, where I’m actually only interested into a subset of the data. This is when slicing comes handy, allowing to pick only the interesting parts and leave the rest out.

So far I used jq, which still is an awesome tool for doing a lot of creative transformations on JSON data. And yet the slicing capabilities still come handy.

So, for the following examples, let’s assume that our input comes from file data.json, with the following data inside:

[
   {
      "foo": "just a string",
      "goo'''": "bar",
      "array": ["one", "two", "three" ],
      "hash": {
         "sub-hash": {
            "baz": "this is important",
            "galook": "this is not"
         },
         "other-stuff": [ 1, 2, 3 ],
         "then": "other data"
      }
   },
   {
      "foo": "JUST A STRING",
      "goo'''": "BAR",
      "array": ["ONE", "TWO", "THREE" ],
      "hash": {
         "sub-hash": {
            "baz": "THIS IS IMPORTANT",
            "galook": "THIS IS NOT"
         },
         "then": "OTHER DATA"
      }
   }
]

We have fancy keys with single quotes, as well as some missing data (the second record misses other-stuff inside hash).

Starting simple: the command line

At the basic level, romeo slice provides us the possibility to use the same paths as available in Template::Perlish, which is one of my most beloved modules. This just works:

$ romeo slice -i data.json foo hash.then
[
   {
      "foo" : "just a string",
      "hash" : {
         "then" : "other data"
      }
   },
   {
      "foo" : "JUST A STRING",
      "hash" : {
         "then" : "OTHER DATA"
      }
   }
]

The data I come across rarely has weird characters inside, so this Just Works. In the Perl spirit of making hard things possible, though, also complex keys are supported with some help from quoting:

$ romeo slice -i data.json "\"goo'''\"" hash.'"other-stuff"'
[
   {
      "goo'''" : "bar",
      "hash" : {
         "other-stuff" : [
            1,
            2,
            3
         ]
      }
   },
   {
      "goo'''" : "BAR",
      "hash" : {
         "other-stuff" : null
      }
   }
]

The slicing created some inexistent data, which might be useful or not. It’s possible to skip non-existing data entirely with option -s/--skip:

$ romeo slice -i data.json "\"goo'''\"" hash.'"other-stuff"' --skip
[
   {
      "goo'''" : "bar",
      "hash" : {
         "other-stuff" : [
            1,
            2,
            3
         ]
      }
   },
   {
      "goo'''" : "BAR"
   }
]

Renaming stuff

Sometimes we would just like to have a summary, cherry-picking stuff in a complex data structure and having it all in a plain hash with everything at the top level.

To help with this, the path definition allows renaming, by supporting a syntax like dst:src or “dst=src`, like in the following example:

$ romeo slice -i data.json "just_goo:\"goo'''\"" other:hash.'"sub-hash"'.baz
[
   {
      "just_goo" : "bar",
      "other" : "this is important"
   },
   {
      "just_goo" : "BAR",
      "other" : "THIS IS IMPORTANT"
   }
]

As a shorthand, it’s possible to just request the name of the last part of the path with character @:

$ romeo slice -i data.json foo @:hash.'"sub-hash"'.baz
[
   {
      "baz" : "this is important",
      "foo" : "just a string"
   },
   {
      "baz" : "THIS IS IMPORTANT",
      "foo" : "JUST A STRING"
   }
]

The destination string is a path on itself, so it’s possible to move stuff around:

$ romeo slice -i data.json some.where:foo some.there:hash.'"sub-hash"'.baz
[
   {
      "some" : {
         "there" : "this is important",
         "where" : "just a string"
      }
   },
   {
      "some" : {
         "there" : "THIS IS IMPORTANT",
         "where" : "JUST A STRING"
      }
   }
]

Moving on: extracting the same slice over and over

If our extraction needs are to be repeated in time, it just makes sense to save the slice definitions inside a file and reuse it over and over.

As we already saw, each definition is a mapping from a source to a destination, which can be represented in JSON like this, saved in file def01.json:

[
    { "dst": ["some", "where"], "src": ["foo"] },
    { "dst": ["some", "there"], "src": ["hash", "sub-hash", "baz"] }
]

The nice thing about it is that with the expanded array form we don’t need the fancy quoting any more and just use the keys.

Here’s the result of applying the slicing, loading it with option -d/--definition:

$ romeo slice -i data.json -d def01.json 
[
   {
      "some" : {
         "there" : "this is important",
         "where" : "just a string"
      }
   },
   {
      "some" : {
         "there" : "THIS IS IMPORTANT",
         "where" : "JUST A STRING"
      }
   }
]

In case of need, additional definitions can be provided using the option multiple times, as well as providing direct slicing commands on the command line like before (e.g. to add more data just for checking).

Interactive selection

Slicing also allows interactively selecting the pieces of interest from the first record, and then applying it to all records. This is done thanks to Term::Choose, which is an amazing piece of code!

In the following asciinema recording, after entering interactive mode:

moving is done with up/down keys
selection of a row is done with the space bar
completing the selection (including the currently highlighted row) is done with the enter/return key

Enough for today!

So… I guess this should be enough for showcasing Romeo’s sub-command slice, see you next time and stay safe!

ETOOBUSY 🚀 minimal blogging for the impatient

Starting simple: the command line

Renaming stuff

Moving on: extracting the same slice over and over

Interactive selection

Enough for today!