| There's a difference between finding a collision and finding a second pre-image. While I agree you shouldn't use MD5, and absolutely don't use a signature algorithm which uses it, finding a second pre-image is harder than finding an arbitrary collision with MD5. An "arbitrary collision" here means you can find two inputs (pre-images) which hash to the same thing. Like you ran some code and discovered that "SDFKLHKLJxchjasdfgklhjaskdhjlf9" hashed to the same thing as "klhkasdfhjkl899078790". Finding a second pre-image means you start with one message, like "ALL QUIET. REMAIN CALM." and figured out that "ATTACK AT DAWN 051928" hashes to the same MIC. I can't believe I'm defending using MD5. But... finding second pre-images is still hard. Sasaki & Aoki say it's got a complexity of around 2^116.9 and requires 11 * 2^45 words of memory (thought 1400Tb isn't THAT outlandish these days.) Still... statements like "finding a second pre-image is hard" don't age well and will guarantee a tractable second pre-image attack will be published tomorrow. But... if you have a bunch of docs and you're not signing them or asking people to trust the hash of each doc, you can (reasonably) quickly de-dup by sorting by MD5 hash and then looking for dups. Which is how many people use MD5. And they continue using MD5 because multiple organizations have similar lists and if you wanted to change it, you would need to get everyone to move to a different algorithm. But yeah... at this point we should assume someone will publish a tractable second pre-image attack "any day now" and get to work migrating from MD5 to MD5 : Next Generation. But good luck getting more than 2 people to agree to what the next preferred hash algorithm should be. |