|
|
|
|
|
by natch
3267 days ago
|
|
I disagree with the standard dogma around bloom filters that you need multiple hash functions. Just use a simple incrementing salt value to modify the input so you can hash the resulting salted input as many times as you need to, using a different salt value each time. Say you want to hash the string "abc" 8 times. Instead of having 8 hash functions, just take the hash of, say, "abc-0", "abc-1", ... "abc-7". As long as your hash function is of good quality and you're treating the output properly, you don't have to worry about the results from these different inputs having some significant relationship. For cryptographic types maybe I'm using the term "salt" loosely. Whatever, think of another term if you like. Anyway in practice this has worked fine for me. |
|
There've been a couple of other analyses (at least) of the importance of independence among hash functions in probabilistic data structures --- which if I remember correctly was "not as much as you'd think" --- but I don't remember the titles of the papers offhand.