Hacker News new | ask | show | jobs
by Dylan16807 3925 days ago
Why would you want 'just' a checksum? I want something I can rely on. If I have to dedicate half a core per gbps of internet-crossing upload, that's not a big deal.
1 comments

The purpose here is not to secure your data against an attacker (that's what TLS is for), or even against errors in transmission (as others have noted, TLS has you covered there as well) - you need something simple and inexpensive to secure against errors in hardware/memory before/after it enters that pipeline. While you shouldn't under-solve a problem, there are real costs to over-solving the problem as well.
You don't need a real attacker to want safety from assumptions that will be true the vast majority of the time, such as "same hash = same file".

For example, I might have md5-colliding files on my hard drive somewhere, that someone else made as a proof of concept. I honestly don't know. But I would worry about using a storage system that depends on md5, because what if it deduplicates without checking every byte?

For the same reason that UTF-16 has encouraged so many broken implementations, at least in a pre-emoji world, it's a bad idea to almost but not quite support convenient features. Either clearly don't support something, or fully support it.