| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by herodotus 969 days ago

After working on PDF document reconstruction for more than a decade, I often fantasized about inventing a cleaner and simpler alternative. After all, there are only three kinds of objects in PDF: shapes, images and glyphs. But it is all those little details that will get you in the end. A line - all you need is a coordinate and a length, right? No: is it solid, what is its width? Is its end point anchored on the left most part of the visible line, or does the thickness spread out from the anchor point? And is the end square of curved? If curved, what are the parameters of the curve? Are both ends the same? On it on it goes. And don't event get me started on glyphs...

PDF is a remarkable creation. It has some notable weaknesses, such as the fact that its color channel for images does not include alpha, and thus needs masks, but the fact that it covers so much visual complexity in a relatively compact form is just amazing. (BTW: Its graphics model is strictly from Adobe Postscript, but PDF content streams are not programs.)

One thing that bugged me while reading this article was the use of the definite article ("the PDF"). Since PDF is an acronym for "Portable Document Format" there may be a grammatical case to be made for the "the", but no one says "the HTML" or "the NASA" and so on.