Hacker News new | ask | show | jobs
by tgamblin 898 days ago
The article mentions the key detail: MD5 is broken for cryptography (collisions) but not for second preimage attacks. I was hoping there would be some discussion of just how much more difficult the latter is. It is extremely difficult.

Let’s ignore that no second preimage attack is currently known for MD5. The software the author links to has a FAQ that links to a paper that lays out the second preimage complexity for MD4:

https://who.paris.inria.fr/Gaetan.Leurent/files/MD4_FSE08.pd...

It takes 2^102 hashes to brute force this for MD4, which is weaker than MD5. A bitcoin Antminer K7 will set you back $2,000, and it gets 58 TH/s for sha256, which is slower than MD5 or MD4. Let’s ignore that MD5 is more complex than MD4, and let’s say conservatively that similar hardware might be twice as fast for MD5 (SHA256 is really only 20-30% slower on a cpu). It’ll take 2^102/58e12/2/60/60/24/365, or about 1.4 billion years to do a second preimage attack with current hardware. So you could do that 3 times before the sun dies.

If you want to reduce that to 1.4 years, you could maybe buy a billion K7’s for $2 trillion. And each requires 2.8kW so you’ll need to find 2.8 terawatts somewhere. That’s 34 trillion kWh for 1.4 years. US yearly energy consumption is 4 trillion kWh.

It will be a while, probably decades or more, before there’s a tractable second preimage attack here.

Yes, there are stronger hashes out there than MD5, but for file verification (which is what it’s being used for) it’s fine. Safe, even. The legal folks should probably switch someday, and it’ll probably be convenient to do so since many crypto libraries won’t even let you use MD5 unless you pass a “not for security” argument.

But there’s no crisis. They can take their time.

3 comments

> The article mentions the key detail: MD5 is broken for cryptography (collisions) but not for second preimage attacks.

The problem with this argument is that people often don't properly understanding the security requirements of systems. I can't count the number of times I've seen people say "md5 is fine for use case xyz" where in some counterintuitive way it wasn't fine.

And tbh, I don't understand the urge of people to defend broken hash functions. Just use a safe one, even if you think you "don't need it". It doesn't have any downsides to choose a secure hash function, and it's far easier to do that than to actually show that you "don't need it" (instead of just having a feeling you don't need it).

For the unlikely event that you think that the performance matters (which is unlikely, as cryptographic hash functions are so fast that it's really hard to build anything where the diff. between md5 and sha256 matters), even that's covered: blake3 is faster than md5.

> I can't count the number of times I've seen people say "md5 is fine for use case xyz" where in some counterintuitive way it wasn't fine.

I can count many more times that people told me that md5 was "broken" for file verification when, in fact, it never has been.

My main gripe with the article is that it portrays the entire legal profession as "backwards" and "deeply negligent" when they're not actually doing anything unsafe -- or even likely to be unsafe. And "tech" apparently knows better. Much of tech, it would seem, has no idea about the use cases and why one might be safe or not. They just know something's "broken" -- so, clearly, we should update immediately or risk... something.

> Just use a safe one, even if you think you "don't need it".

Here's me switching 5,700 or so hashes from md5 to sha256 in 2019: https://github.com/spack/spack/pull/13185

Did I need it? No. Am I "compliant"? Yes.

Really, though, the main tangible benefit was that it saved me having to respond to questions and uninformed criticism from people unnecessarily worried about md5 checksums.

>And "tech" apparently knows better.

The tech community has a massive problem with Dunning-Kruger, and has for basically ever. Hell two decades ago when I was a young guy working in the field so did I.

I'm not sure if its because the field is basically a young man's game and that's inherent with relative youth, or if there's something deeper going on, but its hard to ignore once you notice it.

That said, the idea that you have a better handle of what's going on in the legal system and the needs/uses legal professionals have then actual people in the legal profession and academics in the legal field is a pretty big leap even with those priors.

> I can't count the number of times I've seen people say "md5 is fine for use case xyz" where in some counterintuitive way it wasn't fine.

Help us out by describing a time when this happened. MD5's weaknesses are easily described, and importantly, it is still (second) preimage resistant.

I agree that upgrade is likely your best bet. But I've found the other direction of bad reasoning is a more pernicious trap to fall into. "My system uses bcrypt somewhere so therefore it is secure" and the like is often used as a full substitute for thinking about the entirety of the system.

> MD5's weaknesses are easily described, and importantly, it is still (second) preimage resistant

Most devs have no idea what that means, but most devs still need to use hash functions. They need to use primitives that match their mental model of a hash function. Said model is https://en.m.wikipedia.org/wiki/Random_oracle

The usual answer here is "don't roll your own crypto", but in practice abstinence-only cryptography education doesn't work.

> Help us out by describing a time when this happened.

Linus Torvalds saying that SHA-1 is okay for git, while it is used for Git signatures as well. Signatures are a classic "you need collission resistance to have safe signatures, but people are often confused about it" case.

I might be mistaken, but wouldn't a git signature already be signing trusted things (i.e. the person making the original signature is trusted), making any attack enabled by the input hash function a second preimage attack (i.e. an attacker onky knows the trusted input, not anything private like the signing key)?

Hash collisions mean you can't trust signatures from _untrusted_ sources, but git signatures don't seem to fit that situation.

As you pointed out, signatures make content trusted, but only to the degree of the algorithm's attack resistance. I think it's also important to define trust; for our purposes this means: authenticity (the signer deliberately signed the input) and integrity (the input wasn't tampered with).

If an algorithm is collision resistant a signature guarantees both authenticity and integrity. If it's just second preimage resistant, signing may only guarantee authenticity.

Now, the issue with Git using SHA-1 is that an attacker may submit a new patch to a project, rather than attack an existing commit. In that case they are in control of both halves of the collision, and they just need for the benign half to be useful enough to get merged.

Any future commits with the file untouched would allow the attacker to swap it for their malicious half, while claiming integrity thanks to the maintainers' signatures. They could do this by either breaching the official server or setting up a mirror.

One interesting thing to note though: in the case of human readable input such as source code, this attack breaks down as soon as you verify the repo contents. Therefore it's only feasible longer term when using binary or obfuscated formats.

> And tbh, I don't understand the urge of people to defend broken hash functions. Just use a safe one, even if you think you "don't need it".

The ideal discourse would not imply a binary sense of "safety" at all, much less for a function evaluated outside the context and needs of its usage....

The thing is: We have a binary definition of safety for cryptographic hash functions, and it works well.

You can add a non-binary sense of safety to cryptographic hash functions, but it makes stuff a lot more complicated for no good reason. If you use the "preimage-safe-but-not-collission-safe" ones, you need to do a lot more analysis to show safety of your whole construction. You could do that, but it gives you no advantage.

Second preimage attacks aren't the only threat in a forensics environment.

Also, hand-wavy extrapolations from Bitcoin miners aren't a reliable estimate of how fast & energy-efficient dedicated MD5 hardware could become.

Which part was hand-wavy/unreasonable? Do you think that dedicated MD5 hardware could become billions or even millions of times more efficient within a decade? If so, why?
MD5 is already not "fine" or "safe, even" against malicious actors who might pre-prepare collisions, or pre-seed their documents with the special constructs that make MD5 manipulable to collision-attacks.

Even if your extrapolative method was sound, you've already got several factors wrong. The best SHA256 Bitcoin miners are today more than twice your estimate in hashrate, and on plain CPUs SHA256 is more like 4x slower than MD5. (Your smaller estimate of MD5's speed advantage is likely derived from benchmarks where there's special hardware support for SHA256, but not MD5, as common in modern processors.)

But it's also categorically wrong to think the CPU ratio is a good guide to how hardware optimizations would fare for MD5. The leading Bitcoin miners already use a (patented!) extra 'ASICBoost' optimization to eke out extra parallelized SHA256 tests, for that use-case, based on the internals of the algorithm. As a smaller, simpler algorithm – also with various extra weaknesses! – there's no telling how many times faster dedicated MD5 hardware, either for generically calculating hashes or with special adaptations for collision-search – might run, with similar at-the-gates, on-the-die cleverness.

Further, attacks only get better & theory breakthroughs continue. Since MD5 is already discredited amongst academics & serious-value-at-risk applications – and has been since 1994, when expert cryptographers began recommending against its use in new work – there's not much above-ground scholarly/commercial activity refining attacks further. The glory & gold has mostly moved elsewhere.

But taking solace in the illusory lack-of-attacks from that situation is foolhardy, as is pronouncing, without reasoning, that it's "probably decades or more" before second-preimage attacks are practical. Many thought that with regard to collision attacks versus SHA1 – but then the 1st collision arrived in 2017 & now they're cheap.

You can't linear-extrapolate the lifetime of a failed, long-disrecommended cryptographic hash that's already failed in numerous of its original design goals. Like a bridge built with faulty math or tainted steel, it might collapse tomorrow, or 20 years from now. Groups in secret may already have practical attacks – this sort of info has large private value! – waiting for the right time to exploit, or currently only exploiting in ways that don't reveal their private capability.

You are right that there's no present 'crisis'. But it could arrive tomorrow, causing a chaotic mad-dash to fix, putting all sorts of legal cases/convictions/judgements in doubt. Evidentiary systems should be providing robust authentication/provenance continuity across decades, as that's how long cases continue, or even centuries, for related historical/policy/law issues to play out.

Good engineers won't wait for a crisis to fix a longstanding fragility in socially-important systems, or deploy motivated-reasoning wishful-thinking napkin-estimates to rationalize indefinite inaction.

If I understand this correctly, the paper only shows a particular attack of complexity 2^102. Someone may find a different attack with much lower complexity. That's the usual way how cryptography gets broken -- people find better and better attacks, and suddenly the latest attack has low enough complexity to be practical.