|
|
|
|
|
by qw
418 days ago
|
|
> The client then searches through the list to see if the desired email is in the list or not. The initial prefix check would probably reduce the amount of lookups necessary, as it would only be necessary to do a deeper search if the prefix matches. |
|
There are 5 billion emails in at least 1 breach and 16 million prefixes. Almost all if not all prefixes have at least 1 email in a breach. So almost all prefixes match. I don't see why it's useful to spend a bunch of effort optimizing the very rare case of a prefix not matching.
Now, if the bloom filter checked emails instead of checking prefixes, that would be useful. However, a bloom filter of 5B elements with a 10% false positive rate would be 2.8 GB, which is prohibitively large.
https://hur.st/bloomfilter/?n=5g&p=10&m=&k=