TL;DR

For a new system where compatibility with MD5-hashed passwords is not needed, bcrypt seems to be a valid alternative.

I hope.

In A Future-Adaptable Password Scheme (1999), Niels Provos and David Mazières presented a way to make password hashing difficult at will. Instead of saying “let’s lay out an algorithm that requires 1 Gyear of brute-force attacks to crack a password, hoping that in 20 years this number has not dropped down to a few hours”, they took the approach of assuming that people will change their password from time to time, and the effort taken to hash them can be kept up to date with technical and technological improvements.

What’s password hashing anyway?

If I run a service with users that access with a password, I will have to keep them somewhere. So if user foo has password barbaz, I might have a plain text file like this:

...
foo:barbaz
...

Problem is, anybody with casual read access to that file can see the password and impersonate user foo. Ouch.

One idea might be to encrypt the passwords, so that only those with the right key can decrypt them and make a comparison when user foo tries to authenticate. Alas, this only moves the problem from one file to another: exposing that key means exposing all users in the passwords file.

But… do we really need to ever have cleartext passwords to perform authentication? The answer is now. If we use any one-way function, i.e. a function that:

  • given the same input, it provides us the same output;
  • given two different inputs, it reasonably provides different outputs;
  • given some output, it is extremely difficult to figure out one input that might generate it

then we can compare not the cleartext passwords, which would be “exposed”, but their correspondents after applying the function.

This is what password hashing is about.

For example, let’s suppose that we use the [plain MD5][] algorithm to do this hashing. First of all, when saving the password in the file, instead of the cleartext password barbaz we would save:

$ printf barbaz | md5sum
c3c23db5285662ef7172373df0003206  -

so we end up with:

...
foo:c3c23db5285662ef7172373df0003206
...

which is a clear improvement over the previous situation. For many reasons, though, this is not considered enough, so you have variants which aim to select better hashing functions for this specific application.

With hashed passwords, then, the workflow is the following:

User account creation
cleartext password --[ hashing ]--> hashed-password saved on disk
                                        |
                                        |      +--> authentication OK
                                        v      |
                                    [compare]--+
                                        ^      |
                                        |      +--> authentication FAIL
User login                              |
cleartext password --[ hashing ]--> hashed password

The idea in bcrypt

The basic insight in bcrypt is that a user with the right password don’t mind waiting some time to verify it - let’s say even one full second.

This, paired with a good hashing that leaves little (known) space to collisions, makes for a very promising system.

It is important to have low number of collisions (i.e. distinct passwords that yield the same output). Suppose that we have our one-second procedure that processes an input password and only outputs either 0 or 1. If my encoded password is… 0, on average it will take a handful of seconds to find another input password that yields 0 as well. If 0 and 1 are evenly distributed, it takes an average of 2 seconds to brute force it. So yes, taking a lot of time is only… one side of the coin, a low number of collisions is important too.

What takes 1 full second today, might possibly take 1-hundredth of seconds in a few years, or worse (well… better). Which means that our estimations about how much it would take to brute force today will not be the same in some time.

Hence, the underlying idea in bcrypt is to be able to tweak the wait time and adjust it to the evolving technology. As it allows going faster and faster, the algorithm is told to use more and more resources so that the computing time stays about the same. The key aspect in bcrypt is to include a cost parameter that is aimed exactly at this target.

Enough for low-quality explanations! On with the interesting stuff!

So, in Perl…

The Perl module Crypt::Eksblowfish::Bcrypt

implements the Blowfish-based Unix crypt() password hashing algorithm, known as “bcrypt”.

Well, exactly what we are after, thanks!

The typical workflow for hashed password discussed above is shown in the following example:

#!/usr/bin/env perl
use 5.024;
use warnings;
use experimental qw< postderef signatures >;
no warnings qw< experimental::postderef experimental::signatures >;

use FindBin '$Bin';
use lib "$Bin/local/lib/perl5";

use Crypt::Eksblowfish::Bcrypt qw< en_base64 bcrypt >;
use Test::More;

{
   my %db;
   sub save_user ($u, $p) { $db{$u} = $p }
   sub get_user_hashed_password ($u) { return $db{$u} // undef }
}

sub create_account ($username, $password, $cost, $salt = undef) {
   $salt //= pack 'C*', map {int rand 256} 1 .. 16;
   my $settings = sprintf '$2a$%02d$%s', $cost, en_base64($salt);
   save_user($username, bcrypt($password, $settings));
}

sub authenticate ($username, $password) {
   my $expected = get_user_hashed_password($username) // return;
   my $got = bcrypt($password, $expected);
   return $got eq $expected;
}

create_account(foo => 'barbaz', $ENV{BCRYPT_COST} // 9);

my $hp = get_user_hashed_password('foo');
like $hp, qr{\A\$2a\$}mxs, 'hashed password saved as expected';

ok ! authenticate(foo => $_), 'attempt wrong password'
   for qw< bar baz bar-baz >;
ok   authenticate(foo => 'barbaz'), 'authenticate with right pass';

done_testing;

The two helpers save_user()/get_user_hashed_password() represent a save/retrieve interface against our password storage facility, e.g. a file, a database, etc. In our case, we just use a Perl hash %db.

Both create_account() and authenticate leverage bcrypt to do the underlying work.

In create_account() we make sure to generate a $settings scalar that is compatible with the module, encoding all parameters to drive the hashing of the password, including of course the $cost and the $salt (this is automatically generated if missing, although the goodness of the auto-generated salt is dubious).

In authenticate() we show the algorithm we discussed before: we hash the provided password using the same parameters as we used upon account creation, and then compare the result with the one we stored in our password storage. If they are equal… we’re done.

Happy authenticating!