| > I think you're confusing proving that the hash function is collision resistant with the other goal which is hashing speed. If you really need a collision resistant hash you need to use a cryptographic hash function. I wish this misconception would die. There is a great theory of algorithmic probabilistic hash functions, completely distinct from cryptographic hash functions. If you are designing a hash table, or a different algorithm using a hash function, you nearly always want the former kind. The idea is that `Pr[h(x) = h(y)]` is small _no matter the inputs x and y_.
Here the probability is over the random seed of h.
Lots of good hash functions, like UMASH (https://engineering.backtrace.io/2020-08-24-umash-fast-enoug...) has this guarantee.
Other fast hash functions, like MURMUR don't. When a function doesn't have this guarantee, it means I can find sets of values x1, x2, ... that will likely collide under _any_ or most seeds!
Sure, if your inputs are basically random, this probably won't happen, but people can still use this to DDoS your hash table, or whatever you are coding. Notice again, this has nothing to do with cryptography. It is all about probabilistic guarantees.
You can't just test the hash function on a fixed number of inputs and say it's good, since you may just have moved the "bad set" to somewhere else. In this day and age there are super fast algorithmic hash functions with guaranteed low expected collisions. It's just silly to use one that you can break so easily. |
That sounds like such a function is strongly collision resistant, which means it's also second preimage resistant. And that gets you most of the way to a cryptographic hash function.
Is the only difference that it doesn't have to be first preimage resistant? Compared to cryptographic hashes, does that expand the set of viable functions a lot, to allow first preimages while still not allowing second preimages?
> It is all about probabilistic guarantees
So are cryptographic hash functions.
When I search for `algorithmic probabilistic hash functions` I just get results about bloom filters.