|
|
|
|
|
by tveita
3266 days ago
|
|
> I disagree with the standard dogma around bloom filters that you need multiple hash functions. There is no dogma there, just a confusion of terminology.
Your "abc-0", "abc-1", ... "abc-7" could be considered a family of hash functions, with the suffix as a parameter. "Use this algorithm on the input + a suffix of '-0'" is a different hash function than "Use this algorithm on the input + a suffix of '-1'" Similarly, splitting a long or infinite length output of a hash function into smaller pieces is a perfectly acceptable way of constructing multiple independent hash functions. When a paper talks about "multiple hash functions", that means exactly what it says, but no one is saying that you need multiple hash algorithms. It might mean making multiple hash functions using universal hashing, or through techniques like the "Less Hashing, Same Performance" paper people have linked to, or just using a prefix/suffix on the input like you suggest. No one is using CRC32 and MD5 and SHA1 together as their multiple hash functions, that's silly, unnecessary, complicated and doesn't scale. |
|