Hacker News new | ask | show | jobs
by mr_mitm 879 days ago
One of the selling points of PDF is that it is a single self-contained file. I found this lacking in Sphinx and wrote an extension for it to zip and bundle the assets into a single HTML file: https://github.com/AdrianVollmer/Zundler

Also works with HTML documents produced in other ways.

3 comments

If you just run sphinx-build with the latex builder and then run xelatex or pdflatex on the result you'll get one fully-consistent PDF with everything it it, including fully functional internal hyperlinks. That's what I do for PDF. I can make big documentation packages this way building 2000 page pdfs in a minute or two on a modest laptop.

Wait: also, how is what you're saying different from the built-in singlehtml builder? https://www.sphinx-doc.org/en/master/usage/builders/index.ht...

In the product of the singlehtml builder, you will have the entire document in one single DOM tree. For large documents, even modern browsers on a modern machine will be brought to its knees.

Check out the CPython docs for example: https://adrianvollmer.github.io/Zundler/output/cpython.html

This is a huge document, and having this all rendered naively in one single page will not only be hard to navigate, it will also feel really sluggish if not crash the browser.

Ah, ok, so you want a PDF-like single file but in HTML in a way that's more efficient/scalable than the built-in singlehtml builder. Ok fair enough.

For my use cases, the default multi-file HTML builds are ok, and I just pound out a latex-builder generated PDF for the archives.

You're getting close to making your own CHM format, which Sphinx could make for you.

I always thought CHM files were a nice self-contained option for multi-page HTML docs. (Though they'd happily execute whatever JavaScript the author embedded in there... Maybe that's why they fell out favor?)

It would be great if there was an open CHM-like format that was supported by all major browsers. The nice thing about browsers is that everyone already got one installed. They can even open PDFs natively these days. Sadly, they cannot even open epubs (which is almost like CHM without interactivity). I believe firefox used to be able to open epubs, not sure what happened.
The "Portable EPUBS" discussion happening nearby is on this subject, too.

https://news.ycombinator.com/item?id=39138042

Edge could. MS cut it out long before the move to the chrome rendering engine.
Edge supported epub until the bitter end of the Spartan renderer. It was only Microsoft's attempt at an ebook store that died long before that. Admittedly, most people's visibility into Edge epub support was through the Store and the sidebar dedicated to store purchases, but if you had no other book reader app take over the .epub file extension (or if you realized that you could drag and drop DRM-free .epub files into new tabs) Edge would still read them right up to the Chromium switch.
And it was probably the best EPUB reader available on Windows.

Particularly because of the text-to-speech engine features.

I think it was too. I also think a lot of people missed that there was an app in the Microsoft Store from some team adjacent to the Edge team at the time called the boring and easy to overlook name "Reader" that just had the PDF and EPUB viewers from Edge in a file-based UI instead of browser chrome UI. It was such a useful app and you could set it to default for PDF (in Windows 8 and the early years of 10) and EPUB files (in early Windows 10, with some effort). I never understood why their ebook store effort focused on a sidebar in Edge that didn't work like anything else in Edge instead of beefing up a file-based app like Reader. Reader also died when Edge went to Chromium and I still miss it as a lightweight and fast PDF reader.
Hmm, the disadvantage of your approach is that it unconditionally requires Javascript, even if the original didn't.

Also if you're going to embed a giant binary blob, please ship way to extract it.

Aren't the image blobs embedded in the URLs using Base64-encoded strings rather than using JS?
Yes, it's a trade-off.

Not a bad idea, thanks for the suggestion.