Hacker News new | ask | show | jobs
by anfilt 2265 days ago
This has much better storage/memory bounds than a hash table. The end result is less cache misses.
2 comments

Sure, but 10K items is so little that you might as well not bother. If you're worried about memory usage you can also just scan the whole table each time (no such thing as a cache miss when the whole table fits in cache).
This is better than a hash table for membership testing. As for compared to table scan depends on your key size and number of keys.

This algorithm means you could easily have more than 10k especially for larger keys, and still likely reside in one of the cache levels. Heck a lot CPUs cant hold 10k items in L1. Like for 64 bit numbers thats 80Kb. Let alone for something like UUIDs or other larger sparse keys.

Also main memory access is aprox. delay of a few thousand cycles. Iterating a table of 10,000 items is going to be close to that. A hit in L1 is about 3 cycles. So a table scan is possibly slower or close to equal since a super scalar cpu has multiple execution ports.

Soon as you start getting any larger its easily gonna be slower to do a table scan.

This also gives you O(1) queries and being smaller means that table is more likely to reside in cache for any given look up. This smaller memory footprint is a great boon for any hash algorithm that involves look up.

All excellent points but no matter how impressive the savings it's not going to be interesting if the total time spent checking for membership is low to begin with. You'd need to be optimizing a very tight loop to even notice the difference between brute force and something more sophisticated like a hash map. And you'd need to be pretty damn desperate for more performance to even consider implementing a algorithm described in a paper in the hopes of beating the standard solutions.

It's pretty good fun though, so don't let pragmatism stop you. Just beware of premature optimization and all that.

A hash map is better than a hash table? I don't understand, genuine question, aren't they the same thing or am I getting confused about what you're saying?
Hash map and hash table are the same thing here. That’s what the GP suggests using. (And it’s the right suggestion.)

Bloom filters, cuckoo filters and XOR filters are all types of probabilistic lookup, which are different from normal hash tables.