|
|
|
|
|
by marcan_42
2306 days ago
|
|
No, a bloom filter requires k random projections of the input domain over a domain of m elements. In my smaller sample configuration, k=11 and m~=5000000000. log2(5000000000^11) ~= 354 bits (and you need a few more for padding to get better uniformity). SHA-1 hashes are 160 bits, so you can't use the original hash directly to index into the bloom bitmap. You need to hash again at least twice (or once with something like SHA-512). But really, hashing is so cheap there's no point in trying to be clever like this for an on-disk implementation. My code is designed to take any strings as input, whether they are HIBP hex hashes already or something else. If this were intended to be a high performance in-memory implementation used for a database or something, the constraints would be different. |
|