Reformatting JSON with jq

TL;DR

jq can do wonders to read JSON files and produce some output.

In recent post Thread of tweets via API we saved a thread of tweets as a JSON file. The format, though, is what comes out directly from the Twitter API, which might be too much or too… opinionated.

We’re up to serve tweets as little gems, but we might as well use the same software to provide wisdom from different sources… so we want to stick to a format that is good for us.

A simple target format

Without further ado, here’s our skeleton/example for a format:

{
   "title": "whatever"
   "url": "http://from.where.this/came/from",
   "quotes": [
      {
         "text": "whatever 1",
         "url": "http://from.where.this/came/1",
      },
      {
         "text": "...",
         "url": "..."
      }
   ]
}

This should be a good starting point: we can define an overall title and a URL to provide further context (or, at the bare minimum, acknowledgment to the real source) and a list of quotes, each comprising at least some text to display and, if available, a specific URL (like we would have with tweets in a thread, for example).

Turning to the format

The output from our Twitter API invocation is different from our target format, but here jq can come to the rescue with some filtering capabilities:

jq <twitter-thread.json >quotes.json \
   '{
      "title": "QuinnyPig on Presenting",
      "url": "https://twitter.com/QuinnyPig/status/1215710451343904768",
      "quotes": [
         .[] | {
            "text": .full_text,
            "url": ("https://twitter.com/QuinnyPig/status/" + (.id | tostring))
         }
      ]
   }'

There’s something to digest so let’s add some numbering:

 1 jq <twitter-thread.json >quotes.json \
 2    '{
 3       "title": "QuinnyPig on Presenting",
 4       "url": "https://twitter.com/QuinnyPig/status/1215710451343904768",
 5       "quotes": [
 6          .[] | {
 7             "text": .full_text,
 8             "url": ("https://twitter.com/QuinnyPig/status/" + (.id | tostring))
 9          }
10       ]
11    }'

In a nutshell:

  • the overall output is a single object (like we want) - this is the sense of wrapping everything inside braces (lines 2 and 11);
  • data that are specific to the thread we collected can be put as simple stuff in the output JSON (lines 3 and 4);
  • the tweets we saved from the Twitter API (that we are reading from a file named twitter-thread.json, line 1) are inside an array that we want to iterate (line 6 .[]);
  • the output of the iteration has to go back into an array (lines 5 and 10, i.e. enclosing the output inside square brackets);
  • for each object in the input array, we want to produce an object out (braces in lines 6 and 9);
  • objects in the array contain text taken from Twitter API’s full_text and a URL built from the tweet’s id

That’s it!

Enough for today… be sure to take a look at jq and also consider making it part of your #toolbox because yes, you can also find statically compiled versions in the download page!


Comments? Octodon, , GitHub, Reddit, or drop me a line!