My Personal Experience

It took me some time to get past this challenge, this is more or less how it went:

Code the server part using Mojolicious. This was pretty quick.
Think about the simplest approach possible. The comparison lasts longer for longer matches, so I figured that by simply taking what took more time was always the winner.
The initial solution attempted to find one single character at a time in the hexadecimal expansion, aiming for a 40 characters HMAC. Once found, move on to the following character. Finding a single character means cycling through all the 16 alternatives (0 to 9, then a to f) and timing how much it took to get an answer.
The “network” introduction adds some noise to the timing, so I had to perform multiple probes on each single character to remove it. I settled on taking the minimum, thinking that this was the rock bottom I could obtain with that specific sequence.
This did not work. Failing to correctly detect a character in the 6th position basically means entering a black hole where the correct HMAC is lost for good. One workaround would have been increasing the number of probes, but this way it would take too much and I thought there should be a better way.
One weird thing was that there was a lot of noise, so I figured to remove some of it beforehand just to evaluate the algorithm in a more stable situation. I moved the comparison function into the client side, shutting off the server side temporarily. Still there was a lot of noise! I started suspecting that there was something wrong with my simulation of the time-leaking comparison function, but I was more determined to work on the solution side.
To cope with the black hole problem, the solution is to consider previously excluded sequences based on their merits. So I changed the algorithm to keep track of all attempted sequences, then consider them from the most promising to the least, adding new sequences on the way.
About at this time, I had an epiphany that, to cope with some of the timing noise, I should consider the ratio between the time it takes to evaluate a HMAC and the length of the attempted prefix. This gives us a (biased) estimation of how much time it took to compare each character, which sets a 4-characters prefix taking 210 ms in a better position than a 6-characters prefix taking 209 ms (due to noise). Things improved a bit on the speed side and I had a working solution at last!
I finally looked at the comparison function. The suggested way of adding a fixed delay just did not cut it and added too much noise by itself. I don’t know if it’s my virtual machine, Perl or whatever, so I changed the way I simulated the time leakage to something more adherent to what was written on the can. This was the subject of Cryptopals Diversion 2 - Simulating Time Leaks.
I switched back to the client-server network setup, and it was still working - yay!

Making multiple measurements

As already discussed, attacking one single character at a time can throw us in a black hole if we mess up even one single guess.

Let’s throw some numbers to make an estimation:

suppose we consider $\pm 3 \sigma$ as something that we can be hit by, even in conjunction, leading to a false positive/false negative condition (potentially leading to a black hole)
accumulating the error over $N$ characters means that the standard deviation over the whole correct prefix sequence is $\sqrt{N} \sigma$, so we’re considering $\pm 3 \sqrt{N} \sigma$
if the real delay (with bias) is $d$, we get a false detection when there is a bad luck event in which the good sequence is hit by $- 3 \sqrt{N + 1} \sigma$ and the bad sequence is hit by $+ 3 \sqrt{N} \sigma$ and this results in an error when:

\[\Delta = 3 (\sqrt{N} - (- \sqrt{N + 1})) \sigma > 6 \sqrt{N} \sigma > d \\ N > \left( \frac{d}{6 \sigma} \right)^2\]

if we assume $d = 50 ms$ and $\sigma = 5 ms$, this means that for $N > 3$ we start to increasingly consider feasible that we get into a black hole, with growing probabilities as $N$ grows.

Doing repeated measures for the same characters and taking the average can help us remove noise. Biases will still be there, but we can at least suppose that they will be consistent across all characters, so who cares if a planned delay of 50 ms per characters ends up on average on 54 ms?

According to what stuck in my memory after about 25 years, the standard error in estimating the average reduces with the square root of the measurements that we take. Hence, by making $k$ measurements, our standard deviation on a single character’s timing becomes $\frac{\sigma}{\sqrt{k}}$, so if we want to be reasonably sure to make it at least past character 39 we should take into considerination this:

\[\frac{6 \sqrt{40} \sigma}{\sqrt{k}} < d \\ k > 1440 \left( \frac{\sigma}{d} \right)^2 \\ k > 1440 \cdot 0.01 = 14.4\]

I know I wrote 39 and then used 40, but we are reversing inequalities here, so it’s better to consider $2 \sqrt{N + 1}$ instead of $2 \sqrt{N}$ as before.

So I guess that to be on the safe side we should do at least 20 or more measurements per character, and still we’re not definitely ruling out the black hole problem.

No Ruling Out

The approach I settled to address the black hole problem is to avoid ruling out possibilities. If the real prefix is penalized by a bad run (or group of runs) it will be put somewhere in the back, but it will eventually be considered again as we eliminate luckier but bad prefixes.

To do this I decided to adopt a (maximum) Priority Queue, where we track the following:

prefix
time
weight w, calculated as the ratio between the time and the length of the prefix

and we adopt ordering by the weight.

This is the Perl code of the main function:

sub crack_authenticator ($endpoint, $filename) {
   my $pq = PriorityQueue->new(
      before => sub ($x, $y) { $x->{w} > $y->{w} },
      id_of  => sub ($x) { $x->{prefix} },
      items  => [ { prefix => '', time => 10 , w => 10} ],
   );

   my $rock_bottom = estimate_rock_bottom($endpoint);
   say "rock bottom: <$rock_bottom>";

   while ('necessary') {
      my $candidate = $pq->dequeue; # take the best candidate
      my ($p_prefix, $p_time, $p_w) = $candidate->@{qw< prefix time w >};

      #my $correct_flag = $candidate->{is_correct}
      #   ? (BOLD . GREEN . '* ') : (BOLD . RED);
      #say "${correct_flag}expanding '$p_prefix' with time $p_time"
      #   . RESET;
      say "expanding '$p_prefix' with time $p_time";

      my $suffix = 'f' x (40 - 1 - length($p_prefix));
      for my $char ('0' .. '9', 'a' .. 'f') {
         my $c_prefix = $p_prefix . $char;
         my $c_time = check_hmac($endpoint, $filename, $c_prefix . $suffix)
            or return $c_prefix . $suffix;
         next unless $suffix;  # no suffix? complete hmac, but wrong!

         $c_time -= $rock_bottom;
         my $c_w = $c_time / (1 + length($c_prefix));
         if ($c_w > $p_w) { # this might be a candidate, double check
            for (1 .. 5) {
               my $time = check_hmac($endpoint, $filename, $c_prefix . $suffix);
               $c_time = $time - $rock_bottom if $time < $c_time;
               $c_w = $c_time / (1 + length($c_prefix));
               #say "   $c_time";
               last if $c_w < $p_w;
            }
         }

         #my $is_correct = is_correct_prefix($c_prefix);
         $pq->enqueue(
            {
               prefix => $c_prefix,
               time => $c_time,
               w => $c_w,
               #      is_correct => $is_correct,
            }
         );
         # say " --> right branch '$c_prefix' put at time $c_time"
         #   if $is_correct;
      }

      #say 'queue size: ', $pq->size;
   }
}

The estimation of the rock bottom with estimate_rock_bottom() aims at removing most of the bias due to networking (not the noise), to avoid to spoil the ratio between the time and the prefix length and concentrate on the real delay:

sub estimate_rock_bottom ($endpoint) {
   my $min;
   for (1 .. 100) {
      my $time = check_hmac($endpoint, 'foo.bar', '');
      $min = $time if $time < ($min //= $time);
   }
   return $min;
}

The check_hmac() function is where we call the remote web service and estimate the time it took to get a result back:

sub check_hmac_web ($endpoint, $filename, $hmac) {
   state $ua = Mojo::UserAgent->new;
   my $url = Mojo::URL->new($endpoint)->query(
      { file => $filename, signature => $hmac });
   my $start = time();
   return if $ua->get($url)->result->is_success;
   return time() - $start;
}

BEGIN { *check_hmac = \&check_hmac_web }

In the main function, after calculating a promising weight (i.e. one that is better than the weight of the parent node) we kick in a double check loop:

if ($c_w > $p_w) { # this might be a candidate, double check
   for (1 .. 5) {
      my $time = check_hmac($endpoint, $filename, $c_prefix . $suffix);
      $c_time = $time - $rock_bottom if $time < $c_time;
      $c_w = $c_time / (1 + length($c_prefix));
      #say "   $c_time";
      last if $c_w < $p_w;
   }
}

This helps ruling out false positives, although it does not help with false negatives.

Every prefix that comes out of the priority queue is expanded and tossed away. This does not make the black hole problem reappear, as the prefix survives in each of the 16 expansions that are then re-introduces in the queue (we reintroduce all 16 of them!).

Results with 50 ms:

$ perl 31.pl 
rock bottom: <0.00192809104919434>
expanding '' with time 10
expanding 'c' with time 0.124047994613647
expanding 'c4' with time 0.180799961090088
expanding 'c0' with time 0.171030759811401
expanding 'c5' with time 0.154335975646973
expanding 'c40' with time 0.204444885253906
expanding 'cd' with time 0.152482032775879
expanding 'cd4' with time 0.202331066131592
expanding 'cd4a' with time 0.252432823181152
expanding 'cd4a1' with time 0.303303956985474
expanding 'cd4a1e' with time 0.353938817977905
expanding 'cd4a1e7' with time 0.40339183807373
expanding 'cd4a1e7a' with time 0.451898813247681
expanding 'cd4a1e7a3' with time 0.50347900390625
expanding 'cd4a1e7a30' with time 0.554835796356201
expanding 'cd4a1e7a303' with time 0.603324890136719
expanding 'cd4a1e7e' with time 0.451467037200928
expanding 'cd4a1e7a3032' with time 0.652102947235107
expanding 'cd4a1e7a3032a' with time 0.702485799789429
expanding 'cd4a1e7a3032a9' with time 0.752608060836792
expanding 'cd4a1e7a3032a98' with time 0.803384780883789
expanding 'cd4a1e7a3032a982' with time 0.854714870452881
expanding 'cd4a1e7a3032a9822' with time 0.903710842132568
expanding 'cd4a1e7a3032a9822c' with time 0.952471971511841
expanding 'cd4a1e7a3032a9822c6' with time 1.00357484817505
expanding 'cd4a1e7a3032a9822c6a' with time 1.05259203910828
expanding 'cd4a1e7a3032a9822c6a2' with time 1.10382294654846
expanding 'cd4a1e7a3032a9822c6a2b' with time 1.15368986129761
expanding 'cd4a1e7a3032a9822c6a2f' with time 1.15329670906067
expanding 'cd4a1e7a3032a9822c6a2be' with time 1.20335674285889
expanding 'cd4a1e7a3032a9822c6a2be1' with time 1.25457811355591
expanding 'cd4a1e7a3032a9822c6a2be13' with time 1.30542802810669
expanding 'cd4a1e7a3032a9822c6a2be13a' with time 1.35318684577942
expanding 'cd4a1e7a3032a9822c6a2be13ad' with time 1.40328502655029
expanding 'cd4a1e7a3032a9822c6a2be13ad5' with time 1.45389580726624
expanding 'cd4a1e7a3032a9822c6a2be13ad51' with time 1.50396370887756
expanding 'cd4a1e7a3032a9822c6a2be13ad54' with time 1.5035228729248
expanding 'cd4a1e7a3032a9822c6a2be13ad546' with time 1.55374693870544
expanding 'cd4a1e7a3032a9822c6a2be13ad546d' with time 1.60320401191711
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc' with time 1.65330290794373
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc5' with time 1.70338177680969
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc56' with time 1.75459599494934
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc560' with time 1.80483102798462
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc5603' with time 1.8536388874054
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc56038' with time 1.90901684761047
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc56038e' with time 1.95335698127747
expanding 'cd4a1e7a3032a9822c6a2be13ad546dc56038e2' with time 2.00347876548767
cd4a1e7a3032a9822c6a2be13ad546dc56038e24
it took 852.985954999924 s

We hit a few detours (potential black holes!) but we eventually got back on track, so the whole thing is sound.

Stay safe and secure!

ETOOBUSY 🚀 minimal blogging for the impatient

Cryptopals 31 - Implement and break HMAC-SHA1 with an artificial timing leak

My Personal Experience

Making multiple measurements

No Ruling Out