Hacker News new | ask | show | jobs
Hardening Perl's Hash Function (blog.booking.com)
84 points by rhu86 4598 days ago
2 comments

> An attacker could precalculate one or more suffixes such that H(x) == H( concat(x, suffix) ) which would then allow an attacker to trivially construct an infinite set of keys which would always collide into the same bucket. We hardened the hash by mixing in the length of the key into the seed.

This prevents making an infinite set of keys that collide, yes, but does not fix the problem completely. It only increases the request size required to perform the attack. But even with 8 character UTF-8 keys, you could generate 2^64 unique keys with the same length.

Note: IANAC (I am not a cryptographer.) But if I was looking for guarantees for worst-case performance, I'd (pretty much) never use a HashMap. If nothing else, because of the array resizing.

> But if I was looking for guarantees for worst-case performance, I'd (pretty much) never use a HashMap.

I wonder why none[1] of the major dynamic languages offer the option of running with the built-in associative-data structures implemented using a Red-Black Tree. Seems like an obvious thing to do.

There are interface issues, of course. In particular, for a new type to be usable as a key, it would need to provide an ordering rather than a hash function.

But if only built-in types are used as keys, then there might be no reason to change the interface at all. Just re-run the same program with a different option and you've traded some average-case performance for a guarantee on worst-case performance. And the Red-Black Tree is already written (somewhere out there ...). So why not?

[1] To my knowledge.

> I wonder why none[1] of the major dynamic languages offer the option of running with the built-in associative-data structures implemented using a Red-Black Tree. Seems like an obvious thing to do.

It's not a command-line option, but Perl lets you "tie" a hash to a different backing implementation, and there are some tree-based ones:

    use Tree::RB;
    tie my %hash, 'Tree::RB';
Wow, I had no idea. That's a nice feature.
I reckon because pointer-based trees are very slow compared to array-based data structures; these are constant factors due to memory accesses so they have nothing to do with the average and worst case performance you mention. For databases b-trees are appropriate, but for most uses of associative data structures hash tables are just very hard to beat.
The solution to this problem I've seen from actual cryptographers is Siphash: https://131002.net/siphash/

They address all of the problems of this sort by just designing a hash function that is good enough that collisions are difficult to find, I think even with the secret key.

Thats why I like Lua's hash implementation that adds a random seed to the hashing algorithm to limit those pathological side effects...
The article discusses why this alone may not be enough. (n.b. I've no experience of Lua's internals, so I can't say whether or not Lua is/isn't vulnerable)

Basically, if the attacker can view enough of an 'ordered' list of hashtable entries, they can calculate the seed and then generate appropriate keys to hit the worst case.

Wait, does that mean that iterating across the members of a Lua table is not in deterministic order? I assume it just uses the naive hash ordering for iterators... this is relevant because Lua is used as a scripting-engine SpringRTS which uses a lock-step-based multiplayer approach so anything non-deterministic risks breaking sync.
> Wait, does that mean that iterating across the members of a Lua table is not in deterministic order?

Yes. And it is documented as such. Any dependency on repeatable iteration order is a bug.

Huh. I'm not involved directly in SpringRTS, but I'm going to take a look at the code... I assume there's a compilation flag for Lua to disable that, or they've hacked it up and are using a custom interpreter, because protecting sync is paramount.
Just using "sort(hash-keys())" instead of "hash-keys()" should be a trivial patch that fixes this.
That's often insufficient. For MurmurHash{2,3}, for example, you can generate arbitrarily many keys that collide regardless of the random seed. Code is on the SipHash page:

https://131002.net/siphash/