| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by FRex 2966 days ago

Are you aware of the pitfall I've mentioned here?: https://news.ycombinator.com/item?id=17323068

Just making the possible passwords be 10 chars and contain chars out of a set of 62 (lowercase + uppercase + digits) gives me 0.05 sensitivity at 0.00000001 (0.000001%) FP rate.

Edit:

(This is reply to w8rbt below, here since HN has decided I cannot post anymore for next who knows how many hours.)

I'm aware of this and wouldn't treat a bloom filter as the final decision maker if there is a hit but many people seem to act like it is intended to be one and bring up its FP rate as low enough, i.e.:

1. This comment and several of its replies: https://news.ycombinator.com/item?id=17322692

2. This comment telling me false positives don't matter: https://news.ycombinator.com/item?id=17323438

3. Both of the linked bloom filters don't mention that if there is a hit you should check with https://haveibeenpwned.com/ API.

4. (Of course, this is HN after all!) my comments pointing this paradox/pitfall/whatever out being at negative point scores. I've literally lowered my score for saying "hey, 1 in 1000 FP isn't actually super accurate and almost sure hit is a good hit". And there are one or two green named users who seem to be criticizing this bloom filter with freshly made accounts, exactly because of this BS.

1 comments

w8rbt 2966 days ago

Read the original Bloom Filter paper from 1970. It covers all of this and it's very short. If the test for membership returns 'probably' then you do the more expensive test if you need to know for certain or you don't use one to begin with:

Space/Time Trade-offs in Hash Coding with Allowable Errors BURTON H. BLOOM

https://dl.acm.org/citation.cfm?doid=362686.362692

link