Hacker News new | ask | show | jobs
by layer8 544 days ago
PDFs can have arbitrary files embedded, like XML and JSON. It also supports a logical structure tree (which doesn’t need to correspond to the visual structure) which can carry arbitrary attributes (data) on its structure elements. And then there’s XML Forms. You can really have pretty much anything machine-processable you want in a PDF. One could argue that it is too flexible, because any design you can come up with that uses those features for a particular application is unlikely to be very interoperable.
1 comments

Nice! I looked for meta data and found only an anemic thing, but embedding a whole file with structured data makes perfect sense.

But of course this only pushes the responsibility back a step: why the heck isn’t Adobe pushing developers to include structured data in their output?

Every time you “Save as PDF” there should be a checkbox defaulted on to “Save Data to PDF”.

They’re already done the first thing; why not do the second thing to make everyone’s life easier?

Adobe wants you to purchase Acrobat to be able to do that. Their strategy is to give you Reader for free, but for authoring they want you to buy their software. However there’s also third-party PDF software one can use for that. And apparently Google Drive supports it too: https://www.wikihow.com/Attach-a-File-to-a-PDF-Document