Hacker News new | ask | show | jobs
by _jsdw 2140 days ago
“This looks really promising! A Bloom Filter that represents a set of million items with a false-positive rate of 0.01 requires only 9585059 bits and 7 hash functions.“

My immediate thought here was that this wasn't very space efficient as it requires over 9 bits per item?

I'd have to dig more into this though to confirm or deny my intuition!

Edit: this page agrees with the blog post and shows some interesting graphs https://hur.st/bloomfilter/?n=1000000&p=0.01&m=&k=

1 comments

Yes and no. It requires 9 bits per item in the filter, but the total keyspace may be much larger. If your items are keyed based on 64-bit identifiers, you've compressed things by a factor of 7.1 .
Furthermore, most used of bloom filters I've seen are indexing something notably larger than 64-bits, and usually variable length. Arbitrary names, URLs/URIs, etc.

If you are looking for a key value then chances are it does exist in the DB so the bloom filter won't save you anything as you are then hitting slower storage to read the item that the key represents anyway.

Bloom filters are suited for situations where you are looking for larger non-keyed data such as URLs, where there is a fairly good chance that what you are looking for isn't in the data yet so the "no false negatives" property is important for saving further accesses (which is why cache arrangements is where you'll often find these structures used).