Hacker News new | ask | show | jobs
by masklinn 1199 days ago
> is very fast in a linear design like Abseil's Swiss Tables.

It's not that fast, because branching is completely unpredictable when iterating the sparse array (though a very high load factor helps). That's why a few modern hash tables trade an additional indirection on lookup by storing entries in a dense array and making the sparse array an array of indices into that.

It makes iterating the map as efficient as iterating an array (or zipping two depending on data layout), it also saves some memory (especially if the indices array is adaptive), and has the neat side-effect of making the map conserve insertion order (though deletion is a bit of a concern, tradeoffs may have to be made with the other advantages).

1 comments

Good point about the branch cost.

For a general purpose hash map, I think promising to preserve order despite deletion (like Python's current dict) is a mistake in performance terms†. Python could afford a penalty here because their previous dict was so ludicrously bad that you won't notice. We can't know if the user of a general purpose hash map does 0% deletes or 50% deletes nor whether the ordering is valuable to them at all anyway.

† If we don't promise to preserve order then delete just swaps the removed item from the dense array with the last item, then removes the last item, very efficient, if we insist on preserving order we're either using tombstones which mean branches or we're shuffling half the bloody array for each deletion.