Hacker News new | ask | show | jobs
by marcan_42 2307 days ago
It's not just saving a few bits. A bloom filter requires a lot less storage than just chopping the hashes for the same false positive performance, because it uses multiple hashing steps (in my default config, 11, but this is tunable depending on false positive rate and input size), and all of them have to hit bits that are set in the filter for the match to be positive.

Pwned passwords 2.0 was half a billion passwords. That's 29 bits. To get a 0.05% false positive rate you need an additional 11 bits or so. That's 40 bits per hash, or 20 billion bits total, and then you need to binary search it (29 lookups) or make it into a tree and add bookkeeping structures.

The equivalent performance bloom filter only takes 5 billion bits of storage and 11 single bit checks in a bitmap.