Hacker News new | ask | show | jobs
by Thorrez 417 days ago
>only be necessary to do a deeper search if the prefix matches

There are 5 billion emails in at least 1 breach and 16 million prefixes. Almost all if not all prefixes have at least 1 email in a breach. So almost all prefixes match. I don't see why it's useful to spend a bunch of effort optimizing the very rare case of a prefix not matching.

Now, if the bloom filter checked emails instead of checking prefixes, that would be useful. However, a bloom filter of 5B elements with a 10% false positive rate would be 2.8 GB, which is prohibitively large.

https://hur.st/bloomfilter/?n=5g&p=10&m=&k=