| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gettalong 1045 days ago

The content of a PDF file is not like the content of, say, an HTML or ODT file. With the latter you use plain text with formatting instructions and the application needs to do all the layouting stuff, like glyph positioning (which is already a hard task), paragraph layout (Where to break the lines? How many lines for widows? ...) and so on.

A PDF file is essentially pre-rendered. So the application creating the PDF file needs to do all the stuff mentioned above and the PDF itself just contains the instructions at what exact position on the page which glyph should be rendered.

This makes displaying or printing a PDF much easier (but still a hard task). And that is also the reason why editing PDFs is hard because all the additional information like what is a paragraph, a heading ... is usually not available.

FYI: Tagged PDF has all that structural information and there are developments to allow e.g. reflowing of PDFs on smaller devices.