Hacker News new | ask | show | jobs
by nabaraz 493 days ago
On a similar note, why haven't PDF been replaced? There are XPS, DjVu and XHTML (EPUB) but they all seem to be targeting different usecase (a packaged HTML file).

What I want is a simple document format that allows embedding other files and metadata without the Adobe's bloat. I should be able to hyperlink within pages, change font-size etc without text overflowing and being able to print in a consistent manner.

6 comments

I don't think what makes PDF an 'unfortunate' format for (1) editing, (2) on-device reading, and (3) extraction of semantic information (as opposed to presentational information) is any sin on Adobe's part nor 'bloat.'

It's a page description format, not a data format, so all its decisions follow from the need to ensure that you and I can both print the same 'page' even if we use different operating systems, software, printers, exact paper dimensions, etc. I suspect the main reason it holds on so well is that so many things operate in a document paradigm, where 'document' means 'collection of sheets of paper.' Everything from the After-Visit Summary from the doctor, to your car registration document already has a specific visual representation chosen to allow them to fit sensibly and precisely on sheets of paper.

Could HTML (say, with data URLs for its images and CSS so that it can stand on its own), or ePub be a better format in most ways? Sort of, but it is optimized for such a different goal that if you went in to evangelize that switch to everyone who makes PDFs today, you'd be met with frustration that the content will look a bit different on every device, and that depending on settings, even the page breaks would fall differently.

Relatedly, it's interesting to me that even Google Docs, which I suspect are printed or converted to PDF far less than half the time, defaults to the "paged" mode (see Page Setup) that shows document page borders and margins, instead of the far more useful "Pageless" mode which is more like a normal webpage that fits to window and scrolls one continuous surface endlessly.

Different use cases.

"without text overflowing" brings with it a lot of detail. In pdf every letter/character/glyph of text can have an exact x,y position on the page (or off the page sometimes). This allows for precise positioning of content regardless of what else is going on. It is up to the application that writes the pdf to position things correctly and implement letter or word wrapping.

XPS was the closest to reimplementing PDF but microsoft didn't get enough buy in from other parties so it quietly died.

An interesting aspect of PDFs that I didn't know until quite recently is that they're a subset of PostScript and that in fact accounts for some of the heftiness. PostScript is a full-on programming language (albeit an unusual one) but PDFs are not (i.e. they're not Turing complete). They do not support control flow and what could be expressed as a simple loop in PS must be unrolled and stored as a series of simple declarations/expressions for a PDF.

The advantage is that PDFs don't need a full program interpreter to be rendered.

Allow me to introduce you to PDF JavaScript…
They ran doom on it
Because as soon as this conversation starts, the LaTeX crowd shows up, and everyone who something meaningful to add as a standard is blocked by that discussion.
I appreciate the dedication in making an account just to say this :D
One reason is that none of those other formats are suitable for commercial printing as-is.
Cause it works and works good enough. Also, immutability is a feature, not a bug