Hacker News new | ask | show | jobs
by vaibhavsagar 1481 days ago
Doing a hashmap lookup is already very much like using a bloom filter: for any addressing scheme, if an element exists at a particular location, then either the key exists or it's a false positive. If no element exists, then there's no way the key exists. Adding an extra bloom filter would then come down to trading off space usage against a different probability of false positives, in addition to an extra lookup, and I think one would have to contrive an extremely unlikely scenario for it to make sense.
1 comments

Sometimes "Does anything in this category exist?" is constantly needed and must be as fast as possible - especially if the answer is often "No", while "What is it exactly?" is more rarely needed and it's OK that it's slower.

The Bloom filter gives you quick "No" answers for much less size than a hash table, in exchange for not being able to answer the second question at all.

We built distributed database software, years ago when 40GB was a lot of data, and that used Bloom filters so that if you asked "Show me everything about vaibhavsagar, tialaramex, dang, and celeritascelery" it could do a Bloom filter check, identify that there isn't any data about celeritascelery and dang and immediately cut in half the work to be done, before going to the disk to pull in actual hash tables where records for tialaramex and vaibhavsagar would exist if the positive from the Bloom filter wasn't a false positive.

The UX is nice too because it can make typos instant. For every municipality in Texas, show me the CesusCount. Instant zero records. Oh, right, not CesusCount I wanted CensusCount with an N.