|
|
|
|
|
by haberman
2306 days ago
|
|
There are 555,278,657 passwords in the database. With a Bloom Filter, you could quickly rule out potential inputs. Even better, there is no need to hash the input, because... it's already a cryptographic hash. The input SHA-1 provides 160 bits of hash. If we divide that up into 5 hash values of 32 bits, we can get a 3% false positive rate with a 483 MiB Bloom Filter (which will easily fit in memory). https://hur.st/bloomfilter/?n=555M&p=0.03&m=&k= This will be blindingly fast. We're talking 5 random reads from memory. Even in the worst case of 5 cache misses, we're still well under 1us. This will let us return "not found" for 97% of inputs that aren't in the database. If we get a hit there, then we could turn to a larger bloom filter for greater accuracy, but we'd have to actually hash the key to get more hash bits. Of course if you get hits for all your bloom filters, you still have to do a real lookup to positively confirm that the key is in the database. |
|
As I replied elsewhere, please do not ignore this part for good user experience. I've seen it ignored in open source projects (Keycloak). You don't wanna block perfectly fine passwords for reasons unknown to the user because of false positives. That might cause unwanted reactions at your user's side ("was my password leaked!?!?").