|
|
|
|
|
by rqtwteye
1183 days ago
|
|
I still don't understand how we ended up with PDF as sort of standard to archive data. PDF is already pretty bad for things like manuals but for things like spreadsheets we basically collect the data, then we destroy all the structure by putting it in into POF, and later on we painstakingly try to restore the data from PDF which is often almost impossible to do with accuracy. It just shows that bad solutions often win. |
|
- Non-responsive (compared to HTML). Allows PDFs to serve as a common standard between other document formats with different resizing logic, like Latex and Word.
- Difficultly of network access from code running inside document. Allows PDFs to generally operate offline. Nobody's brave enough to try to write a single page application in a PDF
- Destroying data structure. Allows forward compatibility with anything that can be displayed statically on a screen. New applications can have different ideas about how tables, text or charts should work but if there's static visual output then it'll convert to PDF. Awareness of say, the structure of tables is precisely what makes it so difficult for say google sheets and excel to stay compatible with each other's new table features. If somebody develops a new language with new characters not even in Unicode it'll still work on a PDF
It's also worth noting that most PDF limitations have the characteristic of making things hard but not absolutely impossible. These escape hatches prevent people with hard requirements from actually moving to a new format.
If it were truly impossible to get invoice data from PDFs people might've shifted to a different format for business transactions. But if it's merely difficult some company will come up with an API that works as a good enough extraction solution whose cost is justified by the other compatibility benefits of PDFs, so the ecosystem stays with PDFs.