Hacker News new | ask | show | jobs
by martin_a 924 days ago
For everybody complaining about the non-transformative character of PDF: There are several PDF standards out in the wild.

In the graphic industry we mainly use PDF/X files. These are very solid and precise in defining the layout and how objects are rendered.

For archiving purposes there's another standard, it's called PDF/A. Part of PDF/A is that you must be able to transform its text content back to Unicode.

So, if you're looking into being able to convert PDFs back and forth, you should probably use PDF/A. PDF/X files will drop that support to maintain the desired appearance as close as possible.

https://en.wikipedia.org/wiki/PDF/A

2 comments

I would also add that .pdfs are often not meant to be transformed. They are the digital equivalent of a book, which no one complains about not being able to edit. If you wanted to have a document you could edit you don't use .pdf, but something else before you convert export it as an .pdf. The same is true about images. No one complains about .jpg not being editable, as any sane person would use a photoshop or similar file and only export the final product.
PDF/A is a joke of a “standard” that does almost nothing that is promised on the cover. It is just a subset of PDF with limits on variable options like color representation, frozen at some arbitrary point in time, probably because people working with digital archives realized that they couldn't reach the moving goal, and implement the ever growing list of features. We may only expect programs producing PDF/A files to be less “creative”, and produce straightforward markup, but it's not guaranteed at all, because PDF/A doesn't address any of the real core format issues.