| Does 1 in 1000 false positive mean that 1 in every 1000 passwords not on the list returns a true? If so, that's not good because good passwords vastly outnumber the bad ones so even a small false positive rate makes a positive very very unlikely to be a true one. For example: let's assume all passwords (including all of the ones on haveibeenpwned list) are exactly 8 chars long and composed of 50 characters. This gives a lot of headstart to this filter since the way possible passwords outnumber ones on the list is extremely more crushing in reality due to many possible lengths, unbounded length and there being 95 printable ASCII chars to use (even alnums themselves is 62 chars already). passlen = 8
passchars = 50
badpasses = 501636842
allpasses = passchars ** passlen
falsepos = 0.001
print(badpasses / (badpasses + falsepos * allpasses))
The above script prints 0.012679079642336052, which means (in these extremely forgiving scenario from above where possible passwords are limited to next to nothing compared to real life) only just over one in a hundred positives is a true positive.This apparently comes up with FP rates for cancer and AV and such (most people dont' have cancer, most exes aren't viruses, etc.). I might be wrong, feel free to point that out. I'm not a statistician or a doctor, I just had a statistics class as part of my CS degree. :) Edit: I've found the English terms and explanations: https://en.wikipedia.org/wiki/Sensitivity_and_specificity https://en.wikipedia.org/wiki/False_positive_paradox |
But actually, probably more like 1 in 10 or 1 in 50 users use bad passwords. If we use 1 in 50, we get that 19/20 positives are true positives and only 1/20 are false positives. With 100,000 users we'd expect to trip the filter 2,000 times, only 100 of which are false positives.