Hacker News new | ask | show | jobs
by harshreality 1020 days ago
There's nothing immutable about pdfs. If you have an "original" document, it'll always hash to whatever it hashes to. I fail to see the point. You can cite md5 hashes on LG the same whether they're pdfs or epubs or, heaven forbid, azw3 (amazon's proprietary epub-like format).

What's the obsession with "looking the same everywhere"?

Page references: this shouldn't be a thing. Academia has already solved this problem for notable texts. Rather than nearly uncountable numbers of paragraphs that all run together, paragraphs or short sections or lines are numbered. See any good edition of Plato or Aristotle, or just about any notable play or longer poem ever translated. Relying on a single published layout of a work to reference is dumb.

Citing exact line numbers isn't even necessary for native-language works. When they're digital, search works. It works even better in flowed-format texts than it does in pdfs, which sometimes, depending on how the pdf was constructed, won't match text properly across newlines.

Visual quality: As long as images—data, charts, graphs, photographs—are not degraded beyond usefulness, the actual text, and its display, is up to the reader application. Everyone uses the web complete with mathjax, and those doesn't have Knuth-approved formatting in every respect. But they're good enough, and they work everywhere on every device without squinting or pinch to zoom. There are some people who insist on putting pre-rendered images of math in html, and they always look worse, because they don't match the text without a lot of work to have extra high-res images that are auto-scaled according to viewport and surrounding font size—work that I bet not many people have ever done in the history of html publishing.

3 comments

that's all missing the point.

mhtml would somewhat fit part of the bill of what PDF offers: a single downloadable "file" you can archive or forward and you know: the recipient will see exactly what you saw.

however the mhtml doesn't look the same, depending on the device. and looking.exactly the same helps a great deal in convincing a judge that we all talk about the same.thing.

get me right.

I hate PDF with all passion of my heart. epub (similar to mhtml) imho is a much better format for many intents and purposes and it allows to reflow the contents depending on the device.

but the claim was "PDF is useless and.shall go" and that's cutting.it too short.

I agree PDF is not entirely useless. I don't mind it that much for papers, but I don't see how it's any better than responsive formats. It's obviously important for presentations where exact visual relationships between items is fixed, or flyers (things that inherently get published on dead trees—leaving layout to an html rendering engine in that case is not great), or other things like that.

I still don't understand why it needs to look exactly the same. I get that habitually people say "turn to page X and look 2/3 down the page for the line starting "The quick brown fox jumped", but with digital documents that's not in my experience how anything works. You just say "the sentence starting with 'The quick brown fox'", and everyone can search for it in a few seconds.

If an official proceeding needs to be sure everyone's working from the same document, they can distribute, or publish hashes of, an epub or mhtml the same as they can for a pdf. There's no assurance that two pdfs that you think are the same document are actually the same document, any more than two epubs would be.

Most academic PDFs are typeset and consequently look better than typical web sites. There are notable exceptions such as distill.pub
how does line number citation work for responsive text?
You put line numbers in the margin (with css styling), or I've also seen it as [#] inline in the text, possibly styled differently to make it more intuitive that it's not part of the source text.

For the vast majority of works that are untranslated, that isn't necessary, because, as mentioned, search works fine, and it's faster, too. For translated works, the concept of one published source of truth for page numbers is already broken, so you need some alternative to page numbers anyway.