Hacker News new | ask | show | jobs
by jedberg 2346 days ago
This is really great!

For those that don't know, bloom filters are really good at determining when something is not in a set. So if says "not in set" that's 100% accurate and super fast.

We used this to our advantage at reddit. When you load a comments page, we had to determine if you've voted on anything. So with the bloom filter in front, the query for "have they voted on anything here" was very quick if the answer was no.

But we still had to make the query. So another hack we did was to put a value on your user object to track the last time you interacted with the website. If the comments page was made after your last interaction, then we didn't even have to do the query.

With this, we wouldn't have to do the two step process -- it sounds like this would be super fast when it was time boxed. Maybe...

2 comments

BloomFilters make for good indices, too. My first encounter with them was from the BigTable paper.

We used it in front of a Radix Tree that at places had too much depth. The filter took something like 2MiB at 7% false-positive rate vs 100+ MiB for the Radix Tree. We later bitmap-compressed the Radix Tree [0] to under 5MiB at the cost of making the lookups very very slow (P99 at 4ms, P50 at 1ms compared to P99 at 400ns before, iirc) but added a TinyLFU-based [1] lean-HAMT [2] cache of 512KiB in the front to speed lookups of popular items and a Robinhood cache [3] of 256KiB to take care of the long-tail KVs.

[0] https://news.ycombinator.com/item?id=2348619

[1] http://highscalability.com/blog/2016/1/25/design-of-a-modern...

[2] https://blog.acolyer.org/2015/11/27/hamt

[3] https://blog.acolyer.org/2018/10/26/robinhood-tail-latency-a...

> For those that don't know, bloom filters are really good at determining when something is not in a set. So if says "not in set" that's 100% accurate and super fast.

Agreed. It allows for false positives, but disallows false negatives.