ETOOBUSY 🚀 minimal blogging for the impatient
Tricked by each
TL;DR
HAPPY NEW YEAR! Oh, by the way… beware of
each
in Perl.
I was trying to solve day 10 of Advent of Code 2016 and I stumbled upon a weird bug in my code.
When I say weird I mean weird as quantum physics can be weird. I know, that’s a lot of weirdness. But… the funny bug that was plagueing my code disappeared as I put additional read-only statements to look into it. Just to come back, of course, as I removed that investigation code.
It was amazing. And weird.
As a side note, I have to admit that leaving the printing statements actually removed the bug and provided me a solution to that day’s puzzle. At this point, anyway, I had another puzzle… another big, fat, juicy puzzle.
Let’s re-create something similar
Let’s consider the following code:
#!/usr/bin/env perl
use 5.024;
use warnings;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;
my %store = (
a => { 1 => 1, 3 => 1 },
b => { 2 => 1 },
c => { 4 => 1 },
);
say $_, ' -> ', (get_key_for(\%store, $_) // '*none*') for 1 .. 4;
sub get_key_for ($store, $target_value) {
while (my ($key, $values) = each $store->%*) {
return $key if $values->{$target_value};
}
return;
}
The function get_key_for
gets a hash of hashes as input, and returns
the key in the first hash corresponding to the sub-hash that contains
the target value.
The iteration with the say
should print this:
1 -> a
2 -> b
3 -> a
4 -> c
right? Well… let’s see a couple of runs:
$ perl each.pl
1 -> a
2 -> *none*
3 -> a
4 -> *none*
$ perl each.pl
1 -> a
2 -> b
3 -> *none*
4 -> c
It seems that we’re doomed to miss something!
OK, let’s investigate…
Maybe there’s something changing our hash of hashes behind the scenes, let’s print it then:
#!/usr/bin/env perl
use 5.024;
use warnings;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;
my %store = (
a => { 1 => 1, 3 => 1 },
b => { 2 => 1 },
c => { 4 => 1 },
);
for (1 .. 4) {
print_hash(\%store);
say $_, ' -> ', (get_key_for(\%store, $_) // '*none*');
}
sub get_key_for ($store, $target_value) {
while (my ($key, $values) = each $store->%*) {
return $key if $values->{$target_value};
}
return;
}
sub print_hash ($store) {
for my $key (sort {$a cmp $b} keys $store->%*) {
say ' ', $key, ' -> (', join(', ', keys $store->{$key}->%*), ')';
}
}
Let’s see…
$ perl each.pl
a -> (3, 1)
b -> (2)
c -> (4)
1 -> a
a -> (3, 1)
b -> (2)
c -> (4)
2 -> b
a -> (3, 1)
b -> (2)
c -> (4)
3 -> a
a -> (3, 1)
b -> (2)
c -> (4)
4 -> c
This is what I found interesting: the output is now as we were expecting, i.e. each value from 1 to 4 has a correponding key!
Let’s comment the print_hash
call…
#!/usr/bin/env perl
use 5.024;
use warnings;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;
my %store = (
a => { 1 => 1, 3 => 1 },
b => { 2 => 1 },
c => { 4 => 1 },
);
for (1 .. 4) {
#print_hash(\%store);
say $_, ' -> ', (get_key_for(\%store, $_) // '*none*');
}
sub get_key_for ($store, $target_value) {
while (my ($key, $values) = each $store->%*) {
return $key if $values->{$target_value};
}
return;
}
sub print_hash ($store) {
for my $key (sort {$a cmp $b} keys $store->%*) {
say ' ', $key, ' -> (', join(', ', keys $store->{$key}->%*), ')';
}
}
and back we are:
perl each.pl
1 -> a
2 -> *none*
3 -> a
4 -> *none*
Then something struck me…
Looking at the code, my eye fell on the each
statement. I read
something about it some time ago… something that always makes me a bit
uneasy in using it. Fact is that it’s useful… so I’m not that happy to
give it up completely.
I don’t remember where I ready about its dangers, or when, or what the dangers are actually, but it rang a bell and it was the right one.
The simplest explanation that I have is that each
keeps note of where
it was in the iteration, and restarts from there at the next one. Unless
it can’t, e.g. because the note was destroyed or reset in some way.
Hence, when I just run get_key_for
, depending on the random order of
the elements in the hash I can obtain two results:
-
if
a
comes last, in the first runeach
skipsb
and¢
for it and, at the next call (which is the one aiming atb
), it will just reach the end of the hash, ignoringb
and resetting the hash. Which means that, at the following iteration, we geta
again, and again we skipc
. -
if
a
comes first, thena
and at leastb
are printed. The nexta
is not found, so the state is reset andc
is found because the search has been reset.
There may be other cases, but these two above explain what we saw.
So… it’s a global, hidden variable that gives us headaches!
Why the different behaviour with print_hash
?
When we call print_hash
, we iterate through all the keys
of the hash
of hashes. This resets the scanning with each
, so we eventually get
every element because the each
tracker is reset before each call to
get_key_for
.
Lessons learned?
The bottom line is… try to avoid each
if possible.
The next, less drastic suggestion might be… if you want to use each
,
make sure that you really go through each
key/value pair,
otherwise… use something else!
So long…
… AND A HAPPY NEW YEAR!!!
To read more about this “bug”… Is perl’s each function worth using? contains some interesting considerations!