Hacker News new | ask | show | jobs
by ghshephard 3926 days ago
Apparently, SHA-1 is pretty slow compared to others, about 20x slower than the fastest hash algorithms out there.

https://github.com/Cyan4973/xxHash

You would think, that if it's just being used as a checksum, anything that passes https://code.google.com/p/smhasher/wiki/SMHasher with high marks would be sufficient.

1 comments

Why would you want 'just' a checksum? I want something I can rely on. If I have to dedicate half a core per gbps of internet-crossing upload, that's not a big deal.
The purpose here is not to secure your data against an attacker (that's what TLS is for), or even against errors in transmission (as others have noted, TLS has you covered there as well) - you need something simple and inexpensive to secure against errors in hardware/memory before/after it enters that pipeline. While you shouldn't under-solve a problem, there are real costs to over-solving the problem as well.
You don't need a real attacker to want safety from assumptions that will be true the vast majority of the time, such as "same hash = same file".

For example, I might have md5-colliding files on my hard drive somewhere, that someone else made as a proof of concept. I honestly don't know. But I would worry about using a storage system that depends on md5, because what if it deduplicates without checking every byte?

For the same reason that UTF-16 has encouraged so many broken implementations, at least in a pre-emoji world, it's a bad idea to almost but not quite support convenient features. Either clearly don't support something, or fully support it.