|
Despite the thousands of pages of ISO 32000, the reality is that the format is not defined. Acrobat tolerates unfathomably malformed PDF files generated by old software that predates the opening-up of the standard when people were reverse-engineering it. There’s always some utterly insane file that Acrobat opens just fine and now you get to play the game of figuring out how Acrobat repaired it. Plus all the fun of the fact that you can embed the following formats inside a PDF: PNG, JPEG (including CMYK), JPEG 2000 (dead), JBIG2 (dead), CCIT G4 (dead, fax machines), PostScript Type1 fonts (dead), PostScript Type3 fonts (dead), PostScript CIDFonts (pre-Unicode, dead), CFF fonts (the inside of an OTF), TrueType fonts, ICC Profiles, PostScript functions defining Color spaces, XML forms (the worst), LZ compressed data, Run-length compressed data, Deflate-compressed data. All of which Acrobat will allow to be malformed in various non-standard ways so you need to write your own parsers. Note the lack of OpenType fonts, also lack of proper Unicode! |
Not sure what you mean by "dead", but tons of book scans, particularly those at archive.org, are PDFs of entirely JPEG2000 images.