| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sandwell 4616 days ago
	This looks very cool, however I was able to get lots of false positives using the first few letters of the alphabet as test data. Could you reduce these by using k hashes and k bit vectors with m bits, i.e. one bit vector for each hash algorithm, at the expense of a little extra computation?

2 comments

IanCal 4615 days ago

That's probably because it's a tiny filter. In reality you'd calculate the size more appropriately:

http://hur.st/bloomfilter?n=4&p=1.0E-20

Here you can give it a false positive rate you're happy with and a number of elements. It'll give you the optimal size and number of hashes to use. Your suggestion would use far too much space.

link

microtonal 4615 days ago

The expense is space, not so much the extra computation.

The nice thing about Bloom filters is that (if your hash functions are good), that you compute the vector length given the number of set elements and the probability of false positives that is acceptable in your application.

link