Hacker News new | ask | show | jobs
by jimws 2398 days ago
> Looks just like PDF.

It's better than PDF. Text in PDF is not reflowable. This demo is by virtue of being HTML under the hood. Looks good even after resizing the browser.

I wish PDFs had a reflowable mode. What would it take to add such a mode to the PDF spec without breaking backward compatibility?

PDF's fixed layout is a real problem when we try to convert PDF books containing math to EPUB or MOBI with Calibre. It really messes up the beautifully typeset math in the PDF during the conversion. What's a good solution for this?

4 comments

> It's better than PDF. Text in PDF is not reflowable.

This is relative. Many people dislike reflowable text, and often it is not very convenient. Book typesetting is hard, and once it is done, reflowing the text changes the book, usually for worse. Maybe not for plain text, but definitely for math and poetry, were the "flow" is a crucial part of the work, and it was carefully set up by the author. You do not ever want to "reflow" math or poetry. For these two subjects (my favorite ones, incidentally) pdf seems to be the best option.

PDF is optimized for printing paged media. When I read (instead of skim) physics and math papers I still prefer to print them out. Pagination support is still very primitive in the web stack, at least when I last tried https://www.pagedmedia.org/paged-js/. EPUBs are good until you have equation blocks, floats, etc., then they are pretty far TeX/LaTeX-level typesetting.
> I wish PDFs had a reflowable mode. What would it take to add such a mode to the PDF spec without breaking backward compatibility?

PDF has everything including the kitchen sink. There’s a fairly esoteric feature called “tagged PDFs” which contain extra markup commands within the page description that map it to a logical structure and alternative textual presentations (e.g. MathML). The problem is that nobody uses it as it’s extremely hard to generate (most software which does text layout discards high level semantic details fairly early on, making it hard to retrofit - pdfTeX for example only recently started inserting space characters between words!). Likewise PDF viewers don’t bother to support it.

Reflowing a tagged pdf is fairly easy... once you have the pdf, which of course you don’t.

It’s a hard problem.

I heard tagged PDF is one of the goals of LaTeX3, and LaTeX3 work has thus far been available as packages on top of LaTeX2e, which means it should work on LaTeX2e too. I’ve never tried it though.
ConTeXt supports tagged pdf out of the box
That’s the point of PDFs and EPUBs. If you want resizable PDFs, use EPUB which is an organized and compressed version of HTML.

Edit.

I think the reason you lose quality during conversion is because without the original LaTeX code, you’re just importing copied images of math. If you have access to the original content, then you can maintain quality regardless of window size. One example is arxivvanity which uses the original TeX files on arxiv to rerender in HTML. What do you do without the source code? You can either try vectorizing the images to get scalable SVGs, which depends on heuristics and doesn’t recover typeset perfectly. Or perhaps reverse engineer the formulas with OpenCV or another ML method; this one is pretty hard to do and I’ve only seen mathpix, a paid service, which does this.

Either case it’s a strong argument that content should be separate from content rendering which is why I like TeXMe as a concept and why people should avoid PDFs.

> without the original LaTeX code, you’re just importing copied images of math

Not necessarily. See MathML.