Hacker News new | ask | show | jobs
by fargle 1348 days ago
it's not mean for storing all possible 32 bit integers. if you want to do that, it would take zero storage because the answer is just "yes".

what they are doing is using the fact that a hash table can be loaded to 80% or 90% and still work fine. but they can encode part of the data (the first byte) implicitly in the hashed address. it's kind of cheating because the user provides those bits back to the lookup function to get their answer, so they don't need to be stored. and saving one byte out of four means you store 75% which is a bigger gain than the loss due to the 80%-90% load-factor.

say you have a normal hash table with 90% load factor, and you need to store 100,000 32 bit words, it would take 32 * 100,000 / 0.9 = 3555555.

but if you use this (or some related) trick and only save part of the key, you'd have 24 * 100,000 / 0.9 = 2666666. which is even less than 32 * 100,000 if you used direct storage in a simple array.