Hacker News new | ask | show | jobs
by barrkel 2648 days ago
Yup, the chains can be interleaved. Often it's best to save the hash of each key as well as the key itself. Comparing the full hash (rather than the modulus of the hash) will eliminate many keys faster than doing a full key comparison, and having the full hash available means recreation on expansion is cheap, as all the keys don't need rehashing. The full hashes can then be used to distinguish between different chains if you decide to do backfilling, again cheaper than recalculating key hashes.

Of course, having the hashes available also speeds up recreation, should the tombstone approach be used.

Basically, keep the full hashes around :)

1 comments

Certainly not with open addressing as it will destroy all the cache-line advantages. with seperate chaining it's very common, esp. for resizing.
The key is usually indirected, which means a pointer, which is usually bigger than the hash code.

(And of course you could use parallel arrays if you're super concerned about cache lines, though the tradeoffs would very much depend on whether hash misses or hits are the expected mode.)

And if it’s not a pointer but a word-size integer, just use an invertible hash function (aka permutation aka block cipher) so you can store the hash code in place of the key :)