Awk - the Tool I Keep Forgetting

TL;DR

Sometimes awk is just the tool you need to do the job.

I often forget that awk can be incredibly useful and concise. Which is bad, because being part of POSIX makes it near to ubiquitous, and being implemented in Busybox (see this post) also makes it part of the #toolbox. Definitely worth considering more!

One thing I use it often is to isolate one column out of tabular data, like e.g. output from the openstack client. As an example, take this run (copied from this article):

$ openstack image list
+--------------------------------------+---------------+-------------+
| ID                                   | Name          | Status      |
+--------------------------------------+---------------+-------------+
| 76a93b4b-566c-41eb-ab9d-26ce5be04010 | CentOS-7      | active      |
| 0c35ed32-a053-4481-8bac-efd3a5f9513e | CoreOS-Stable | active      |
| 0415d69c-28b3-4bfd-ab64-711c699c60b9 | Debian-8      | active      |
| cbd81524-f477-4d7c-963c-84699acf711b | Debian-9      | active      |
| b0a29bbc-dd13-4305-8380-043b86356edf | Fedora-25     | active      |
| 03b1467b-4c5d-4f85-aa62-035841a88aca | Fedora-29     | active      |
| 03a87e23-77e5-403b-a437-10e0b28b2583 | Ubuntu-14.04  | active      |
| 04f22a69-bdfe-4c2d-b996-aab8d69e4a0e | Ubuntu-16.04  | active      |
| fc6510a1-c057-4a74-bac9-3d8b74270038 | Ubuntu-17.10  | active      |
| b86ca11c-e7c1-4ae1-8580-62c464b19dfd | Ubuntu-18.04  | active      |
+--------------------------------------+---------------+-------------+

It’s easy to take the IDs only using awk. First, consider that it reads onle line at a time (much like grep, sed, …) and splits it into fields, by default separated by one or more spaces. So, in this example, the second line would be like this:

$1 is set to |
$2 is set to ID
$3 is set to |
$4 is set to Name
$5 is set to |
$6 is set to Status
$7 is set to |

So, if we want the IDs, we would have to print out the second field, i.e. $2:

$ openstack image list | awk '{print $2}'

ID

76a93b4b-566c-41eb-ab9d-26ce5be04010
0c35ed32-a053-4481-8bac-efd3a5f9513e
0415d69c-28b3-4bfd-ab64-711c699c60b9
cbd81524-f477-4d7c-963c-84699acf711b
b0a29bbc-dd13-4305-8380-043b86356edf
03b1467b-4c5d-4f85-aa62-035841a88aca
03a87e23-77e5-403b-a437-10e0b28b2583
04f22a69-bdfe-4c2d-b996-aab8d69e4a0e
fc6510a1-c057-4a74-bac9-3d8b74270038
b86ca11c-e7c1-4ae1-8580-62c464b19dfd

Uhm ok, we have to do some filtering here… how about only getting lines with active? My reflexes would point me to grep:

$ openstack image list | grep active | awk '{print $2}'
76a93b4b-566c-41eb-ab9d-26ce5be04010
0c35ed32-a053-4481-8bac-efd3a5f9513e
0415d69c-28b3-4bfd-ab64-711c699c60b9
cbd81524-f477-4d7c-963c-84699acf711b
b0a29bbc-dd13-4305-8380-043b86356edf
03b1467b-4c5d-4f85-aa62-035841a88aca
03a87e23-77e5-403b-a437-10e0b28b2583
04f22a69-bdfe-4c2d-b996-aab8d69e4a0e
fc6510a1-c057-4a74-bac9-3d8b74270038
b86ca11c-e7c1-4ae1-8580-62c464b19dfd

Much better, but overkill: awk can do the filtering pretty out of the box, by just preceding the {print $2} with a pattern:

$ openstack image list | awk '/active/{print $2}'
76a93b4b-566c-41eb-ab9d-26ce5be04010
0c35ed32-a053-4481-8bac-efd3a5f9513e
0415d69c-28b3-4bfd-ab64-711c699c60b9
cbd81524-f477-4d7c-963c-84699acf711b
b0a29bbc-dd13-4305-8380-043b86356edf
03b1467b-4c5d-4f85-aa62-035841a88aca
03a87e23-77e5-403b-a437-10e0b28b2583
04f22a69-bdfe-4c2d-b996-aab8d69e4a0e
fc6510a1-c057-4a74-bac9-3d8b74270038
b86ca11c-e7c1-4ae1-8580-62c464b19dfd

Which makes me think that I probably just use about 2% of awk…

ETOOBUSY 🚀 minimal blogging for the impatient