Hacker News new | ask | show | jobs
by jexp 2 days ago
Shouldn’t it be possible since forever to put machine readable source information into PDF metadata. It’s more a problem of the tools and programs generating the PDFs.

We spend millions turning structured information into PDFs and billions to extract the same data from a printer rendering language

3 comments

Exactly. But we have no real coordination or uniform application in how we're creating PDFs across all these programs so we always end up with a fun mix of what will and wont be static, scalable, searchable
Yes this is already possible. You can look up the ZUGFeRD standard for an example of how this is done for German invoices.
Exactly. It’s pretty insane that we have converged on storing documents as PDF. And it looks like no work is done on making PDF files machine readable.