Hacker News new | ask | show | jobs
by bradleyjg 3186 days ago
I read a few of these articles a couple of years ago. Along with a few of the inevitable cuckoo hash filter rejoinders.

I think they are neat algorithms and I'm glad to have come across them. But that said, I have yet to find a problem in my day to day work which required set membership, with space at a premium, and where false positives were acceptable. So I've never used either in anger.

4 comments

I use them at work as a cache during data ingestion phrase (analytics). I have to store a unique URL for each page the user is at, and each page generates a lot of requests. So I store the URLs inside a Bloom Filter, hitting the DB only when the contain() returns False. It's a neat little thing that saves me thousands of unnecessary database hits per second.
I have used them for text segmentation. It's an extremely quick way to test for membership on a set (30+ million tokens in my case) that would otherwise be too large to hold in main memory.
If you are bored with bloom and cuckoo filters then check out quotient filters. Quotienting was one of those mind blown things for me.
Thanks for the reading list! :-)
They're very useful in large scale web crawling/scraping. I use them for a number of things in this field.
Also in distributed brute-forcing of encryption standards.