|
|
|
|
|
by maxxxxx
3266 days ago
|
|
I still don't understand how PDF could become one of the standards for publishing documents. Well structured content gets converted into PDF which loses most of that structure. And then a lot of work is done to guess that structure from PDF and convert it back to a better file format. It just shows that successful solutions don't have to be technically good. |
|
(Its PostScript origins may also explain the bizarre mix of text and binary that constitute the file format. For example, page contents are in a relatively free-form PostScript-ish RPN-like textual language, but are found in "content streams" which may be compressed or encoded into a binary format. Data "object" structures include things like '<<'-delimited dictionaries, '[' arrays ']', textual "/Names", and even provisions for comments(!?).
Then there are things like the cross-reference table of all objects in the file, which is an array of fixed-width textual numbers representing file offsets, e.g. "0000001056 00000 n" refers to something 1056 bytes from the start of the file. Reactions of WTF!? from those working with the format for the first time are not uncommon.)