| Are you aware of the pitfall I've mentioned here?: https://news.ycombinator.com/item?id=17323068 Just making the possible passwords be 10 chars and contain chars out of a set of 62 (lowercase + uppercase + digits) gives me 0.05 sensitivity at 0.00000001 (0.000001%) FP rate. Edit: (This is reply to w8rbt below, here since HN has decided I cannot post anymore for next who knows how many hours.) I'm aware of this and wouldn't treat a bloom filter as the final decision maker if there is a hit but many people seem to act like it is intended to be one and bring up its FP rate as low enough, i.e.: 1. This comment and several of its replies: https://news.ycombinator.com/item?id=17322692 2. This comment telling me false positives don't matter: https://news.ycombinator.com/item?id=17323438 3. Both of the linked bloom filters don't mention that if there is a hit you should check with https://haveibeenpwned.com/ API. 4. (Of course, this is HN after all!) my comments pointing this paradox/pitfall/whatever out being at negative point scores. I've literally lowered my score for saying "hey, 1 in 1000 FP isn't actually super accurate and almost sure hit is a good hit". And there are one or two green named users who seem to be criticizing this bloom filter with freshly made accounts, exactly because of this BS. |
Space/Time Trade-offs in Hash Coding with Allowable Errors BURTON H. BLOOM
https://dl.acm.org/citation.cfm?doid=362686.362692