| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mr_mitm 879 days ago
	One of the selling points of PDF is that it is a single self-contained file. I found this lacking in Sphinx and wrote an extension for it to zip and bundle the assets into a single HTML file: https://github.com/AdrianVollmer/Zundler Also works with HTML documents produced in other ways.

3 comments

acidburnNSA 879 days ago

If you just run sphinx-build with the latex builder and then run xelatex or pdflatex on the result you'll get one fully-consistent PDF with everything it it, including fully functional internal hyperlinks. That's what I do for PDF. I can make big documentation packages this way building 2000 page pdfs in a minute or two on a modest laptop.

Wait: also, how is what you're saying different from the built-in singlehtml builder? https://www.sphinx-doc.org/en/master/usage/builders/index.ht...

link

mr_mitm 879 days ago

In the product of the singlehtml builder, you will have the entire document in one single DOM tree. For large documents, even modern browsers on a modern machine will be brought to its knees.

Check out the CPython docs for example: https://adrianvollmer.github.io/Zundler/output/cpython.html

This is a huge document, and having this all rendered naively in one single page will not only be hard to navigate, it will also feel really sluggish if not crash the browser.

link

acidburnNSA 879 days ago

Ah, ok, so you want a PDF-like single file but in HTML in a way that's more efficient/scalable than the built-in singlehtml builder. Ok fair enough.

For my use cases, the default multi-file HTML builds are ok, and I just pound out a latex-builder generated PDF for the archives.

link

markdoubleyou 879 days ago

You're getting close to making your own CHM format, which Sphinx could make for you.

I always thought CHM files were a nice self-contained option for multi-page HTML docs. (Though they'd happily execute whatever JavaScript the author embedded in there... Maybe that's why they fell out favor?)

link

mr_mitm 879 days ago

It would be great if there was an open CHM-like format that was supported by all major browsers. The nice thing about browsers is that everyone already got one installed. They can even open PDFs natively these days. Sadly, they cannot even open epubs (which is almost like CHM without interactivity). I believe firefox used to be able to open epubs, not sure what happened.

link

WorldMaker 879 days ago

The "Portable EPUBS" discussion happening nearby is on this subject, too.

https://news.ycombinator.com/item?id=39138042

link

jhoechtl 879 days ago

Edge could. MS cut it out long before the move to the chrome rendering engine.

link

WorldMaker 879 days ago

Edge supported epub until the bitter end of the Spartan renderer. It was only Microsoft's attempt at an ebook store that died long before that. Admittedly, most people's visibility into Edge epub support was through the Store and the sidebar dedicated to store purchases, but if you had no other book reader app take over the .epub file extension (or if you realized that you could drag and drop DRM-free .epub files into new tabs) Edge would still read them right up to the Chromium switch.

link

Shorel 879 days ago

And it was probably the best EPUB reader available on Windows.

Particularly because of the text-to-speech engine features.

link

WorldMaker 879 days ago

I think it was too. I also think a lot of people missed that there was an app in the Microsoft Store from some team adjacent to the Edge team at the time called the boring and easy to overlook name "Reader" that just had the PDF and EPUB viewers from Edge in a file-based UI instead of browser chrome UI. It was such a useful app and you could set it to default for PDF (in Windows 8 and the early years of 10) and EPUB files (in early Windows 10, with some effort). I never understood why their ebook store effort focused on a sidebar in Edge that didn't work like anything else in Edge instead of beefing up a file-based app like Reader. Reader also died when Edge went to Chromium and I still miss it as a lightweight and fast PDF reader.

link

o11c 879 days ago

Hmm, the disadvantage of your approach is that it unconditionally requires Javascript, even if the original didn't.

Also if you're going to embed a giant binary blob, please ship way to extract it.

link

3rd3 879 days ago

Aren't the image blobs embedded in the URLs using Base64-encoded strings rather than using JS?

link

mr_mitm 879 days ago

Yes, it's a trade-off.

Not a bad idea, thanks for the suggestion.

link