Hacker News new | ask | show | jobs
by joeblau 3277 days ago
This was the key to our data analytics url de-deuping platform back in 2011. We were pulling in 50k social media messages an hour and there were lots of duplicate links running though our pipeline. We had a 100GB bloom filter backed by Redis to keep a list of all links that came though our system and it worked beautifully.
1 comments

A 100GB bloom filter? What does that mean? How many hash functions.. how many bits?
I would guess that a 100GB bloom filter would have 800 gigabits.

You can make some guesses, with 7 hashes and a false probability of 1%, a bloom filter designed on a cardinality of 100 billion elements is a little over 100GB.