Hacker News new | ask | show | jobs
by Dwedit 902 days ago
MD5 is incredibly broken. The PDF file PoC||GTFO 0x14 (https://dl.packetstormsecurity.net/mag/pocgtfo/pocorgtfo14.p..., 42MB large) is a PDF file that can be also run in a NES emulator, and will display its own MD5 hash. The MD5 hash is also shown in the pdf document itself. (Don't download it from archive.org, their copy is altered)

The fact that any document can contain its own MD5 hash embedded in there should be hugely concerning enough.

The hash also happens to start with 5EAF00D.

4 comments

There's a GIF MD5-quine here: https://news.ycombinator.com/item?id=13823704

And a PNG version too: https://news.ycombinator.com/item?id=32956964

But no one has made an exclusively plaintext (ASCII) MD5-quine yet, and I suspect doing so may be impossible given the characteristics of collision blocks.

How is it impossible? I would think an MD5 quine exists with probability approaching 1 as the size of the document grows to infinity. Think about the reduced problem:

1. a document containing "1", whose hash begins with "1"

2. a document containing "12", whose hash begins with "12"

3. a document containing "123", whose hash begins with "123"

#1 is certain to exist. #2 exists, but would take 16x as long to brute force. #3 would take 16x longer again. If this pattern doesn't continue until 2^128, where would it stop, and why?

All hashes can be brute forced this way, even secure ones SHA-2. Its security relies on the fact that the earth doesn't contain enough computing power to execute a brute force attack within the universe's lifetime.

as the size of the document grows to infinity.

Therein lies the problem.

Also the fact that it would need to be constrained to 7-bit ASCII only, and on top of that be "valid" in its natural language. It's a neat trick to make two documents look completely different with the same hash, but looking at the techniques which are required, they all rely on a binary file format and copious amounts of data which are effectively "hidden" --- all of which do not apply to a text file.

Maybe try to have a look at it as permutations: the mapping "hex of the hash" → "its actual hash" is a (presumably random) permutation. And it's quite probable that such permutation has a fixed point: http://laurentmazare.github.io/2014/09/27/fixed-points-of-ra...

The problem is that we currently don't know how find it more efficiently than with exhaustive search, AFAIK.

Edit: previously on HN: https://news.ycombinator.com/item?id=614079

> a PDF file that can be also run in a NES emulator, and will display its own MD5 hash. The MD5 hash is also shown in the pdf document itself.

By that logic, SHA 256 is also broken:

  $ cat >sha256.py
  from hashlib import sha256
  s = 'from hashlib import sha256\ns = %r\nprint sha256(s%%s).hexdigest()\n'
  print sha256(s%s).hexdigest()
  $ sha256sum sha256.py
  14cc85c420ced317fdb73e9403ac3f6e1d96d19c70ae0dce8da9b8d96fa0b4d3  sha256.py
  $ python sha256.py
  14cc85c420ced317fdb73e9403ac3f6e1d96d19c70ae0dce8da9b8d96fa0b4d3
(Yes, PDF is turing complete. Yes, that's terrible. No, it doesn't have anything to do with hash function deficiencies; it's turing complete on (malicious) purpose, just like webpages with javascript.)
Don't be silly here, the MD5 is clearly in the plaintext here, and the NES ROM is only the first 40k of the file. It is not able to scan itself and print out a hash that way.
> the MD5 is clearly in the plaintext here

Alright, I'll bite: at what byte offset in the binary file contents does a trivial encoding[0] of the MD5 hash occur?

> the NES ROM is only the first 40k of the file. It is not able to scan itself and print out a hash that way.

It is possible to encode the effects of multiple blocks of arbitrary[1] data on a hash function internal state (independently of what state you start in) in much less space than that data actually takes up, although I'll grant that actually doing so is somewhat impressive in it's own right, so I don't have a trivial translation to SHA 256 immediately ready to post.

Edit: tracked down my saved version:

  $ md5sum pocorgtfo14.pdf
  5eaf00d25c14232555a51a50b126746c  pocorgtfo14.pdf
  $ grep -aoi 5eaf00d pocorgtfo14.pdf || echo not found
  not found
  $ # using ...b126746c because 5eaf00... has a nul
  $ grep -aoF $(printf '\xb1\x26\x74\x6c') pocorgtfo14.pdf || echo not found
  not found
The MD5 is definitely not clearly in the plaintext here, though it could be only mildly unclear.

0: Eg, I'd accept 31 34 63 63 38 35 63 34 ... as an encoding of 14cc85c4... from above.

1: Including random/incompressible data.

I stand corrected here, ran it in a debugger, and the NES ROM appears to be doing significant computation between each HEX digit being displayed. Haven't actually read the trace log yet to confirm if it is performing MD5.

NES ROM still doesn't have any access to the rest of the file though.

I dont know about that specific file, but pdf usually gzips its component parts so i wouldn't expect naive grepping to work.
PDF being Turing complete doesn't mean that the program embedded within can access the bytes of the document.
It might be the case that the more complicated the artifact you're trying to forge is, the easier the MD5 forgery gets; the challenge with doing hash forgeries is that you lose control of some of the values and placement of bytes, which is more noticeable/disruptive in a short document than in a file that is a bunch of (seekable, error-recovering) formats at once.

(I've never tried to built one of these, so I could be just totally wrong here).

>The fact that any document can contain its own MD5 hash embedded in there should be hugely concerning enough.

I think that's pretty amazing, to be honest.