|
|
|
|
|
by jandrewrogers
399 days ago
|
|
The list of non-cryptographic hash functions that aren’t currently known to be broken for general purpose hashing is pretty short. There are three dimensions to performance: latency, throughput, and microarchitecture. Algorithm designs explicitly tradeoff older microarchitecture performance for newer microarchitecture performance, depending on if the objective of portability is compatibility or consistency. For latency-optimized hashing, rapidhash is currently the fastest algorithm I know of with consistently good hash quality. Hash quality is outside the cryptographic range, which is expected for a latency-optimized design, but close enough that you can at least make a comparison. Should be pretty portable too. It is what I recommend for general-purpose small-key hashing. For throughput-optimized hashing, rotohash[0] is currently the fastest algorithm I know of, with single-core throughput saturating CPU cache bandwidth. Hash quality is at the low-end of the cryptographic range, about the same as MD5. While it is built on portable 32-bit primitives, it clearly targets modern microarchitectures — the Zen 5 performance metrics, though not listed, are crazy. This is brand new, part of a research project to efficiently and robustly checksum I/O at current hardware line rates (100s of GB/s). Any algorithm can be faster if you completely disregard quality. In fairness, the quality bar has risen quickly as we’ve become better at designing these algorithms and finding the defects. State-of-the-art algorithms a decade ago are all considered obviously and hopelessly broken today. Designing a competitive hash function now requires an extremely high degree of skill and expertise. It was much easier to come into it as a hobbyist and produce an acceptable result ten years ago. [0] https://github.com/jandrewrogers/RotoHash |
|