Hacker News new | ask | show | jobs
by 0815test 2653 days ago
I'm not sure that backshifting is as easy as "moving all items in the chain forward by one slot". Consider the hashtable [A, B1, B2, _, _], where one element is subsequently added after B2, giving [A, B1, B2, X, _]. Now when we remove B1 and shift B2 forward one slot ([A, B2, _, X, _]), we have to shift X forward if it hashes to the second or the third slot in the table, but not the fourth. So there might be multiple chains involved; if it was one contiguous chain only, we could simply arrange for the "hole" in it to be filled with no need for shifting all the items in it, and it would be quite efficient. However, it seems that it's not so simple.
1 comments

Yup, the chains can be interleaved. Often it's best to save the hash of each key as well as the key itself. Comparing the full hash (rather than the modulus of the hash) will eliminate many keys faster than doing a full key comparison, and having the full hash available means recreation on expansion is cheap, as all the keys don't need rehashing. The full hashes can then be used to distinguish between different chains if you decide to do backfilling, again cheaper than recalculating key hashes.

Of course, having the hashes available also speeds up recreation, should the tombstone approach be used.

Basically, keep the full hashes around :)

Certainly not with open addressing as it will destroy all the cache-line advantages. with seperate chaining it's very common, esp. for resizing.
The key is usually indirected, which means a pointer, which is usually bigger than the hash code.

(And of course you could use parallel arrays if you're super concerned about cache lines, though the tradeoffs would very much depend on whether hash misses or hits are the expected mode.)

And if it’s not a pointer but a word-size integer, just use an invertible hash function (aka permutation aka block cipher) so you can store the hash code in place of the key :)