Hacker News new | ask | show | jobs
by cmurphycode 3188 days ago
Having many filters is worse, but not hugely so. I believe the problem is reducible to the fact that you'll have many smaller filters thus the random distribution hurts you a bit more.

The key word to google is "blocked bloom filter" e.g. as proposed in http://algo2.iti.kit.edu/documents/cacheefficientbloomfilter...

Here's a nice paper with some improvements http://tfk.mit.edu/pdf/bloom.pdf

We use blocked bloom filters for a couple of reasons, but one major benefit is the memory locality (our "bloom filter" is 32GB or larger, so it's handy & fast to be able to address it with separate "pages" which are really just individual bloom filters.)