|
|
|
|
|
by thomasahle
1105 days ago
|
|
I'm disappointed a the hashing is just based on training on microbenchmarks and SMHasher, rather than designing a fast _provably_ universal hash.
Suites like SMHasher are never complete. They are just trying to catch the most common weaknesses. If you train on the test cases you'll only get an algorithm that passes the tests, but people can always find a set of values on which you will do badly. |
|
For the other 99.999% of hashing applications there is a balance between collision resistance and hashing latency. For example, in a hash table (probably the most common use for a non-cryptographic hash function) there is a cost incurred by hash collisions because lookups on keys with collisions may have to do extra probing. On the other hand, every hash table lookup requires doing at least one hash operation, regardless of whether or not it collides. So it may make sense to have a slightly worse hash function (in the sense that it is more likely to have collisions with pathological inputs) if it has slightly lower latency. The only way to really know what is faster for a real world application is to have some kind of benchmark to train against as a loss function.