Hacker News new | ask | show | jobs
by jeffreportmill1 1535 days ago
Off topic, but man is that document hard to use as a reference. Ironically, I wish they would publish it as HTML broken down by chapter and section.

(I have used that document a lot to write a custom PDF generator and parser in Java, using a downloaded copy)

3 comments

> Ironically, I wish they would publish it as HTML broken down by chapter and section.

I wish there was an EPUB version of the document. Do PDFs support reflowable content?

I believe one of the selling points of PDFs was the absolute lack of reflowing content.
Right, as the point is to represent a physical document, paper and ink (or canvas, toner, whatever -- stuff that doesn't reflow).

Why anyone would use such a format for these situations, where the audience definitely cares way more about consuming it on an electronic device than printing it out, is... mind-boggling.

Of course, AI+ML to the rescue: Liquid Mode [0].

> Files are processed in our secure data servers and immediately deleted from our servers after the experience is generated.

[0] https://www.adobe.com/devnet-docs/acrobat/android/en/lmode.h...

I've found people being precise about the flow of equations and text intermixed can be easier to read than reflowing content. Other than that, not so much.

Edit: Non-reflowing content also works well if you need to refer people to page numbers and paragraphs.

I look forward to playing with liquidmode at some point soon.

CSS flow control and specifying an `id` attribute value as a URL fragment would be my solutions to those particular concerns, if it weren't the case that our context here is capturing from software that offers printing but doesn't offer exporting to HTML very well. I think the solution might be "bring it to a good web dev and have a solid punch list."
A PDF can be reflowed without reconstructive processing only if a PDF was generated as a Tagged PDF [1] and if the viewer supports reflowing.

[1]: Essentially a PDF with its own EPUB inside it, but unlike just having an attached EPUB, there is a map between the page layout of the PDF and the tags.

There are implementations of reconstructive reflowing that infer the layout block structure and reading order and can reflow a two column paper into a single column.

PDFs can support tables of contents with labeled chapters and sections. Not sure if the feature is standardized, but it's there.
The specification does have a hierarchical outline, and you can click on cross references too. Of course navigation can still be cumbersome, linking to chapters can also be awkward (tip: right click on outline element and copy link works in Firefox).

There are some problems of the spec though, and navigation is not the most pressing one. The spec is huge, support for less used parts is spotty in various PDF readers. It also has inaccuracies (not corrected in errata) and underspecified parts.

> hard to use as a reference

How so? I frequently reference specific sections, tables or pages of the spec at work.