Hacker News new | ask | show | jobs
by guidoism 3275 days ago
A few weeks ago I would have made the same statement but I started reading the PDF implementation docs and now I really like the format.

The main issue I think we all have with the format is that people make docs that are almost impossible to read on a small screen.

There are ways around this: 1. Tagged PDFs present the underlying content and semantics in order to reflow for accessibility purposes though right now very few people seems to use this feature and 2. Maybe it wouldn't be a bad thing to make PDF pages closer to a paperback book rather than an A4 page with the resulting shorter line length and reduced margins.

PDF is indeed more complex than plain HTML with some cribbed CSS but in many ways it's a lot better: 1. It truly in portable in the sense that every computer will render it in exactly the same way, 2. It packages up all assets in an efficient manner (only the glyphs that are needed are included, not the entire font with all glyphs and position hints like web fonts), 3. The expensive layout computation is done once, on a computer in a galaxy far far away from my battery limited phone, and 4. PDFs are (by convention) free from all of the cacophony of crap like share buttons and navigation chrome and ads and articles-you-may-enjoy fluff.

The format itself is actually not that bad, it's a text format in that it's relatively easy to open up a text editor and bang one out. The only inconveniences are the places where you need to state exactly how long strings are (which your text editor can help with) and the creation of the index at the end (which I've been cheating by just running my hand created PDFs through a PDF lint-like utility.

The reason why most PDFs look crazy when opened up in a text editor is that the streams are almost always compressed. You can uncompress with them "qpdf --stream-data=uncompress in.pdf out.pdf"

2 comments

I think the format itself could be ok, but not in its current format.

The PDF header can be anywhere in the document. This makes parsing for bad content harder. (Also, you can have a valid pdf that is also a valid zip with the contents being the original file. How do you virus scan this?)

To many old image formats allowed, with readers only supporting a subset. Tiff alone has a bunch of options that are often broken. What happens when you put a multipage tiff in a pdf? (I think you just see the first page in Reader but some other reader might allow you browse them)

Lots of features in later versions that are not well supported. (forms, document libraries, scripting?)

It has been while since I left the document area but while I liked simple PDFs, once you say you support them, you have to support all of them which is almost impossible to do correctly. The later specs just have too many features that are almost unused but add a lot of complexity that really isn't needed in a portable document format. A stripped down/cleaned up version of the spec would be nice.

The share buttons and navigation chrome certainly can be put in a PDF, they're just incredibly uncommon. I've even played video games that were distributed as PDF files.
Pdf games? Do you have an example you could send me, perhaps? I'm really curious.
The ones I played were adventure games, basically a bunch of areas (implemented as pages) linked by buttons, with some extra logic thrown in. If you can imagine the kind of scripting capabilities you'd need to run a PowerPoint style presentation from a PDF file, and the kind of scripting you'd need to make sophisticated interactive forms, you're on the write track.

Unfortunately I don't know what the games I played were called.