Hacker News new | ask | show | jobs
by EthanHeilman 4291 days ago
>Finding a collision in MD5 is costly, finding a collision in MD5 which is within -+10% of the actual size is extremely costly (technically possible, but maybe not in your lifetime).

MD5 collisions with 10% of the size of the file can be found in seconds on a old laptop computer. I've done it, we assign it as HW in class.

Read this http://www.mathstat.dal.ca/~selinger/md5collision/

Notice that the two colliding exe are exactly the same file size. These attacks have only gotten better.

>Zip is an extremely good format for crafting fake files which match a checksum. Really any format which can take arbitrary metadata (which is MOST) is pretty easy.

The example I gave uses windows and linux executables. No zip files in sight. These attacks are from 2009.

2 comments

> Notice that the two colliding exe are exactly the same file size. These attacks have only gotten better.

They're also 6, not 200+ KB. They have been specially crafted to be as small as possible to make the problem set as easy as possible.

> The example I gave uses windows and linux executables. No zip files in sight. These attacks are from 2009.

That's a really strange reply. What is it you think I said..? I said and to quote you quoting me: "'Really any format which can take arbitrary metadata (which is MOST) is pretty easy.'"

So why you felt the need to point out that it is an executable not a zip file is uhh strange to say the least...

>They're also 6, not 200+ KB. They have been specially crafted to be as small as possible to make the problem set as easy as possible.

That is not how it works, MD5 is vulnerable to length extension attacks[0]. Once you collide part of an MD5 hash, if everything that follows that collision is the same, it can be as long as you want. Colliding large files is just as easy as colliding small files. You could perform the same exercise with 1GB executables.

[0]: http://en.wikipedia.org/wiki/Length_extension_attack

> Once you collide part of an MD5 hash, if everything that follows that collision is the same, it can be as long as you want. Colliding large files is just as easy as colliding small files.

I've read that three times, still don't follow what you're getting at. That isn't how length extension attacks work/can be utilised.

Please go ahead and generate a file that collides with any of the linked files and is the same file size. The content doesn't have to be valid or readable, junk/binary is fine. If you can do this in a reasonable period of time (e.g. 24 hrs) then your point would have been proven.

The smallest is 224K with a hash of 180caf23dd71383921e368128fb6db52.

That's not what a collision attack[1] is. You're probably thinking of a pre-image attack[2].

[1] http://en.wikipedia.org/wiki/Collision_attack

[2] http://en.wikipedia.org/wiki/Preimage_attack

I didn't use the expression "collision attack" ever in this thread. I quoted someone else who used that term however (and the context of the whole discussion is clearly related to preimage attacks, not collision attacks).
That's if you're generating both sides. If someone has a file and I want to generate a new file with a matching MD5, that's a lot harder.
MD5 is both vulnerable to collision attacks and targeted collision attacks. We can imagine both in the wikileaks case. You are correct that Target collision attacks are more difficult but they have been done in research for many years now[0](2006) and they are showing up in the wild as well[1](2012).

[0]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140....

[1]: http://blogs.technet.com/b/srd/archive/2012/06/06/more-infor...

Those are both chosen-prefix attacks. They're impressive, but not relevant to this case where one file is completely out of the attacker's control.