Hacker News new | ask | show | jobs
by sandwell 4616 days ago
This looks very cool, however I was able to get lots of false positives using the first few letters of the alphabet as test data. Could you reduce these by using k hashes and k bit vectors with m bits, i.e. one bit vector for each hash algorithm, at the expense of a little extra computation?
2 comments

That's probably because it's a tiny filter. In reality you'd calculate the size more appropriately:

http://hur.st/bloomfilter?n=4&p=1.0E-20

Here you can give it a false positive rate you're happy with and a number of elements. It'll give you the optimal size and number of hashes to use. Your suggestion would use far too much space.

The expense is space, not so much the extra computation.

The nice thing about Bloom filters is that (if your hash functions are good), that you compute the vector length given the number of set elements and the probability of false positives that is acceptable in your application.