| HN Mirror

Sometimes "Does anything in this category exist?" is constantly needed and must be as fast as possible - especially if the answer is often "No", while "What is it exactly?" is more rarely needed and it's OK that it's slower.

The Bloom filter gives you quick "No" answers for much less size than a hash table, in exchange for not being able to answer the second question at all.

We built distributed database software, years ago when 40GB was a lot of data, and that used Bloom filters so that if you asked "Show me everything about vaibhavsagar, tialaramex, dang, and celeritascelery" it could do a Bloom filter check, identify that there isn't any data about celeritascelery and dang and immediately cut in half the work to be done, before going to the disk to pull in actual hash tables where records for tialaramex and vaibhavsagar would exist if the positive from the Bloom filter wasn't a false positive.

The UX is nice too because it can make typos instant. For every municipality in Texas, show me the CesusCount. Instant zero records. Oh, right, not CesusCount I wanted CensusCount with an N.