Hacker News new | ask | show | jobs
by TheLoneWolfling 4598 days ago
> An attacker could precalculate one or more suffixes such that H(x) == H( concat(x, suffix) ) which would then allow an attacker to trivially construct an infinite set of keys which would always collide into the same bucket. We hardened the hash by mixing in the length of the key into the seed.

This prevents making an infinite set of keys that collide, yes, but does not fix the problem completely. It only increases the request size required to perform the attack. But even with 8 character UTF-8 keys, you could generate 2^64 unique keys with the same length.

Note: IANAC (I am not a cryptographer.) But if I was looking for guarantees for worst-case performance, I'd (pretty much) never use a HashMap. If nothing else, because of the array resizing.

2 comments

> But if I was looking for guarantees for worst-case performance, I'd (pretty much) never use a HashMap.

I wonder why none[1] of the major dynamic languages offer the option of running with the built-in associative-data structures implemented using a Red-Black Tree. Seems like an obvious thing to do.

There are interface issues, of course. In particular, for a new type to be usable as a key, it would need to provide an ordering rather than a hash function.

But if only built-in types are used as keys, then there might be no reason to change the interface at all. Just re-run the same program with a different option and you've traded some average-case performance for a guarantee on worst-case performance. And the Red-Black Tree is already written (somewhere out there ...). So why not?

[1] To my knowledge.

> I wonder why none[1] of the major dynamic languages offer the option of running with the built-in associative-data structures implemented using a Red-Black Tree. Seems like an obvious thing to do.

It's not a command-line option, but Perl lets you "tie" a hash to a different backing implementation, and there are some tree-based ones:

    use Tree::RB;
    tie my %hash, 'Tree::RB';
Wow, I had no idea. That's a nice feature.
I reckon because pointer-based trees are very slow compared to array-based data structures; these are constant factors due to memory accesses so they have nothing to do with the average and worst case performance you mention. For databases b-trees are appropriate, but for most uses of associative data structures hash tables are just very hard to beat.
The solution to this problem I've seen from actual cryptographers is Siphash: https://131002.net/siphash/

They address all of the problems of this sort by just designing a hash function that is good enough that collisions are difficult to find, I think even with the secret key.