| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by OhMeadhbh 902 days ago

There's a difference between finding a collision and finding a second pre-image. While I agree you shouldn't use MD5, and absolutely don't use a signature algorithm which uses it, finding a second pre-image is harder than finding an arbitrary collision with MD5.

An "arbitrary collision" here means you can find two inputs (pre-images) which hash to the same thing. Like you ran some code and discovered that "SDFKLHKLJxchjasdfgklhjaskdhjlf9" hashed to the same thing as "klhkasdfhjkl899078790". Finding a second pre-image means you start with one message, like "ALL QUIET. REMAIN CALM." and figured out that "ATTACK AT DAWN 051928" hashes to the same MIC.

I can't believe I'm defending using MD5. But... finding second pre-images is still hard. Sasaki & Aoki say it's got a complexity of around 2^116.9 and requires 11 * 2^45 words of memory (thought 1400Tb isn't THAT outlandish these days.)

Still... statements like "finding a second pre-image is hard" don't age well and will guarantee a tractable second pre-image attack will be published tomorrow.

But... if you have a bunch of docs and you're not signing them or asking people to trust the hash of each doc, you can (reasonably) quickly de-dup by sorting by MD5 hash and then looking for dups. Which is how many people use MD5. And they continue using MD5 because multiple organizations have similar lists and if you wanted to change it, you would need to get everyone to move to a different algorithm.

But yeah... at this point we should assume someone will publish a tractable second pre-image attack "any day now" and get to work migrating from MD5 to MD5 : Next Generation. But good luck getting more than 2 people to agree to what the next preferred hash algorithm should be.

1 comments

bawolff 902 days ago

So if the document is evidence , then its probably created by the attacker. This seems like a setup where collision is more relavent than 2nd preimage.

link

OhMeadhbh 902 days ago

How so?

link

adrian_b 902 days ago

Second preimage attacks are relevant for the documents that you create and give to others.

Keeping a hash of the document ensures that you can prove that any altered document shown by someone else is not the original.

Collision attacks are relevant for the documents created by others, which you receive.

If you have a hash of the document that is collision-resistant, you can trust that the creator does not have other variants of the document with the same hash.

If the hash is not collision resistant, i.e. it is MD5 or SHA-1, you cannot know if the creator of the document has not also created another variant of the document than the one handed to you, which has the same hash.

That is why a digital signature on a document received from others is meaningful only if it is based on a collision-resistant hash.

If you sign and verify your own documents, for detecting modifications, a second preimage attack resistant hash would be enough.

link

OhMeadhbh 901 days ago

Again. A "collision" means you have two pre-images which hash to the same value, but you did not pick either of the two pre-images. So if someone gives you a doc that says "SDKLFHJSDJKLGHJKLb9iyasdfkghjasdf97897asdfg798789asd" and then gives you another doc that says "klhjasdfhjklasdfhjkl97879087908789sdfga" and they have the same hash, then... what has the attacker achieved other than proving they've found a collision.

A "second pre image" means they can give you a document like '{"status":"calm","launch_missiles":false}' and then later come up with another document like '{"status":"angry","launch_missiles":true,"whatever":"a9d7s8gh283g7d7"}' and both would have the same hash.

A critical part of using a hash function is understanding how it can be used. So if I was expecting a message to parse to a JSON blob and you gave me "SDKLFHJSDJKLGHJKLb9iyasdfkghjasdf97897asdfg798789asd", it doesn't matter what the second message's hash is, because you've given me two messages which can't be used.

In the use case you've given, it turns out courts don't look at hashes of messages, they look at messages. So a collision is of limited use for forensics.

In password hashing systems, if you could force someone to use the password "SDKLFHJSDJKLGHJKLb9iyasdfkghjasdf97897asdfg798789asd", you could come in later and use "klhjasdfhjklasdfhjkl97879087908789sdfga" to log in. But you should be pilloried for not using something like a PKCS#5 PBEKDF. If you used PBEKDF2 for instance, you would now be looking for a second pre-image of the salt prepended with the password. And again, second pre-images are harder than finding a collision.

I absolutely agree that a digital signature is only meaningful if it uses a collision and second-pre image resistant hash function. But that's not what we were talking about.

I'm also very happy that the knee-jerk response to MD5 is now "STOP IT BEFORE IT GETS TO THE CHILDREN." A decade ago I had a senior architect say it was okay to use MD5 in new systems because Bruce Schneier's 1996 "Applied Cryptography" said it was okay. I spent the next year moving that app from auth using straight MD5 of the password to an SRP based system.

link

bawolff 900 days ago

> A "collision" means you have two pre-images which hash to the same value, but you did not pick either of the two pre-images.

I think the use of the word "you" is ambiguous here (do you mean the attacker? verifier?).

In an attack scenario for a collision attack, you would have an attacker prepare two documents that have the same hash but a different message. Attacker uses the innocent message initially, and then later swaps it to the evil message pretending it was that all along (or vice versa).

The way i could see it happening in a court setting (This is super far fetched and a bunch of reasons why this wouldn't work in practice).

Attacker, knowing they might end up in court, creates two payloads, one evil, one innocent with same md5 hash.

Attacker uses the evil payload to attack some target

Attacker gets arrested

In court, the put the payload the attacker used into evidence, indexed by its md5 hash

Attacker claims in court that it is all a misunderstanding, all they sent to the server was the innocent payload that just so happens to have the same hash as the evil one.

There's a bunch of (social) reasons why this probably wouldn't work, but this seems just as viable as the 2nd pre-image attack, and unlike the 2nd pre-image attack, actually is viable with md5.

link