|
|
|
|
|
by mizmar
321 days ago
|
|
Something similar is used in swiss tables - metadata bucket entries are 1 bit occupancy marker and 7 MSB bits of hash (don't remember how tombstones are represented).
Metadata table is scanned first, the upper part of hash should discard filter out most colliding entries and the "false positives" lead to probe in entry table and key comparison (possibly optimized with full-hash comparison). Buckets use 16 bytes because sse2 and arm neon SIMD are basically guaranteed. I was shocked to read on how swiss tables - proclaiming to be the fastest hash tables - work. It's just open-addressing linear probe with no technique to deal with collisions and clustering. Plus the initial version rounded hashes%capacity to bucket size, thus using 4 less bits of hash, leading to even more collisions and clustering.
Yet the super fast probe with apparently made it not an issue? Mind-boggling.
(Later version allowed to scan from arbitrary position by mirroring first bucket as last.) |
|
Historically, the computational cost of excellent hash quality was too high, so the overall cost of a hash table was lower using a worse hash and more complex table design. Today, it is possible to generate high quality hashes with minimal computational cost, so complex hash table designs are less useful. Modern hash table performance can largely be reduced to the average number of cache lines (both data and instruction) touched by the operations.