Hacker News new | ask | show | jobs
by tialaramex 2120 days ago
Rainbow Tables are a specific innovation in time-space tradeoffs (precomputation) rather than the name for all such attacks.

The specific clever trick in Rainbow Tables is the observation that rather than storing

  hash(password) : password
  5f4dcc : password
  c2fe67 : jimmy
  25d55a : 12345678
... we can build a function that takes the output from hash(password) to deterministically create a new candidate password, let's call this function pass(hash), and then chain the hash and our new function together as many times as we want. This lets us store much less data, while doing more work during our look-up phase.

  hash(pass(hash(password))) : password
  153dfc : password
  92fe87 : jimmy
  213eea : 12345678
Now if I find a hash 92fe87 in a password hash file, I do not learn that the password was jimmy, instead I need to compute pass(hash(jimmy)) and that's the password I was looking for. And if I find 39a4e6 which isn't in my list, I calculate hash(pass(39a4e6)) and discover that's 213eea, then I look this up in the table and I discover the password I need was 12345678. Obviously real Rainbow Tables don't just run the hash twice like this, but instead some fixed number of times chosen by the creator to trade off less space versus more work to find a password.
2 comments

I should actually fix this. What I've described above is basic "chaining", but Rainbow Tables are a further improvement still by Philippe Oechslin. The additional insight in Rainbow Tables is that we can reduce collisions in our hash-pass-hash-pass back and forth if we modify that pass function so that its behaviour varies by depth, this way if a collision occurs but at different depths in different chains (e.g. maybe the chain starting with password "password" hashes immediately to 5f4dcc but in another chain the value 5f4dcc is found for the password "j58X_m04" after six steps) the next call to pass() will diverge again, so the collision only wastes a small fraction of our precomputation effort. If the collision does happen at the same place in the chain, the final hash output will be identical to another chain, so it's easy to discover this problem and apply whichever mitigation seems appropriate.
Interesting, I haven't worked with rainbow tables very much since by the time I got into the world of hash cracking it had either been deprecated by salting or wasn't relevant (i.e. NTLM). That is a clever trick of trading back some of the space for extra time; I remember some of the rainbow table file sizes being ridiculous to the point of almost unusable haha.

edit: spelling