Hacker News new | ask | show | jobs
by gcanyon 544 days ago
The real question is: why is everything stuck in PDFs, and the more important meta-question is: why don't PDFs support meta-data (they do, somewhat). So much of what we do is essentially machine-to-machine, but trapped in a format designed entirely for human-to-human (also lump in a bit of machine-to-human).

Adobe has had literally a third of a century to recognize this need and address it. I don't think they're paying attention :-/

2 comments

PDFs can have arbitrary files embedded, like XML and JSON. It also supports a logical structure tree (which doesn’t need to correspond to the visual structure) which can carry arbitrary attributes (data) on its structure elements. And then there’s XML Forms. You can really have pretty much anything machine-processable you want in a PDF. One could argue that it is too flexible, because any design you can come up with that uses those features for a particular application is unlikely to be very interoperable.
Nice! I looked for meta data and found only an anemic thing, but embedding a whole file with structured data makes perfect sense.

But of course this only pushes the responsibility back a step: why the heck isn’t Adobe pushing developers to include structured data in their output?

Every time you “Save as PDF” there should be a checkbox defaulted on to “Save Data to PDF”.

They’re already done the first thing; why not do the second thing to make everyone’s life easier?

Adobe wants you to purchase Acrobat to be able to do that. Their strategy is to give you Reader for free, but for authoring they want you to buy their software. However there’s also third-party PDF software one can use for that. And apparently Google Drive supports it too: https://www.wikihow.com/Attach-a-File-to-a-PDF-Document
PDFs are essentially compressed Postscript, which is Turing complete, so a PDF in theory can do anything you want.
The big distinction between PostScript and PDF was the removal of the language operators[1]. Adobe Distiller unrolled the PostScript language to create a file without the code.

[1] A small part remained for defining calculations as well as support for PostScript fonts.