|
|
|
Why do we convert structured data to PDFs?
|
|
18 points
by n0rlant1s
1245 days ago
|
|
Company A has structured data. They input this into a PDF (making it unstructured) and send it to Company B. Company B now has to use PDF parsing software to turn it back into structured data. Why? |
|
PDF has facilities for tagging documents such that they can be reflowed like HTML so they can be viewed on different sized screens. It is a boon for accessibility but framing the discussion around accessibility as opposed to a better experience for everyone, particularly automated tools, is hard. (e.g. in politics there is the analogy of how we "can't have good things" because policies that are good for everyone get framed as policies that benefit a racial or other group perceived as a "special interest")
I spoke w/ Larry Masinter at Adobe and he told me Adobe would like people who want structured data in their PDF documents to simply attach files to the PDF. A scientific paper could contain a CSV file of the data, for instance, or a business document could contain a JSON or XML document.
Note that "structured" is not a panacea because the structure might not be the same in the two organizations. For exchange of structured data to take place the organizations have to agree on some ontology, something that happens in some industries some of the time, but it isn't free, and when it is not in place people still have an excuse to continue using paper processes or processes that emulate paper processes.