Hacker News new | ask | show | jobs
by ascar 2306 days ago
> Of course if you get hits for all your bloom filters, you still have to do a real lookup to positively confirm that the key is in the database.

As I replied elsewhere, please do not ignore this part for good user experience. I've seen it ignored in open source projects (Keycloak). You don't wanna block perfectly fine passwords for reasons unknown to the user because of false positives. That might cause unwanted reactions at your user's side ("was my password leaked!?!?").

2 comments

The false positive rate can be made arbitrarily small. I have an implementation with a fp rate of 1:1.000.000 with a filter size of just 1.8 GB (https://github.com/adewes/have-i-been-bloomed). The cost of lowering the rate is logarithmic so you could go to much lower values for little cost (1:1.000.000.000 would be 2.8 GB) so false positives are not really a problem in practice. If you really want you could still perform an exact check against a DB if the filter returns true to rule out false positives with certainty, though at one false positive for one billion requests this might be exaggerated.
That makes sense. I was thinking of this more as a fun algorithms optimization challenge, not something to actually put into a production setting with real users. I agree that for real users you would especially not want to skip the last step.