Hacker News new | ask | show | jobs
by acidburnNSA 877 days ago
Sphinx and reStructuredText are, IMHO, underrated power houses of document building. With extensions, you can hook them up to Zotero (or whatever)-managed bibtex files. You can render to beautiful HTML files, and you get latex PDFs and epubs for free. First class latex-math support, plenty of integrations with things like mermaid, graphviz, and the ability to build super-powerful custom directives to do basically anything. And way simpler/easier than pure LaTeX.

Heck you can even integrate a full-on requirements management system in them using sphinx-needs https://sphinx-needs.readthedocs.io/en/latest/

8 comments

It is too complex compared to Markdown and hasn't got enough features to be comparable to Latex. And I still (almost) use the samé Latex templates that I used at university, 25 years ago.
I feel the complexity is justified. One of the biggest gripes I have with markdown is that you never know whether your markdown implementation is github flavoured or some other implementation. Not to mention Sphinx checks your links / references to other pages exist and give you warnings if you don't have them.
One of the selling points of PDF is that it is a single self-contained file. I found this lacking in Sphinx and wrote an extension for it to zip and bundle the assets into a single HTML file: https://github.com/AdrianVollmer/Zundler

Also works with HTML documents produced in other ways.

If you just run sphinx-build with the latex builder and then run xelatex or pdflatex on the result you'll get one fully-consistent PDF with everything it it, including fully functional internal hyperlinks. That's what I do for PDF. I can make big documentation packages this way building 2000 page pdfs in a minute or two on a modest laptop.

Wait: also, how is what you're saying different from the built-in singlehtml builder? https://www.sphinx-doc.org/en/master/usage/builders/index.ht...

In the product of the singlehtml builder, you will have the entire document in one single DOM tree. For large documents, even modern browsers on a modern machine will be brought to its knees.

Check out the CPython docs for example: https://adrianvollmer.github.io/Zundler/output/cpython.html

This is a huge document, and having this all rendered naively in one single page will not only be hard to navigate, it will also feel really sluggish if not crash the browser.

Ah, ok, so you want a PDF-like single file but in HTML in a way that's more efficient/scalable than the built-in singlehtml builder. Ok fair enough.

For my use cases, the default multi-file HTML builds are ok, and I just pound out a latex-builder generated PDF for the archives.

You're getting close to making your own CHM format, which Sphinx could make for you.

I always thought CHM files were a nice self-contained option for multi-page HTML docs. (Though they'd happily execute whatever JavaScript the author embedded in there... Maybe that's why they fell out favor?)

It would be great if there was an open CHM-like format that was supported by all major browsers. The nice thing about browsers is that everyone already got one installed. They can even open PDFs natively these days. Sadly, they cannot even open epubs (which is almost like CHM without interactivity). I believe firefox used to be able to open epubs, not sure what happened.
The "Portable EPUBS" discussion happening nearby is on this subject, too.

https://news.ycombinator.com/item?id=39138042

Edge could. MS cut it out long before the move to the chrome rendering engine.
Edge supported epub until the bitter end of the Spartan renderer. It was only Microsoft's attempt at an ebook store that died long before that. Admittedly, most people's visibility into Edge epub support was through the Store and the sidebar dedicated to store purchases, but if you had no other book reader app take over the .epub file extension (or if you realized that you could drag and drop DRM-free .epub files into new tabs) Edge would still read them right up to the Chromium switch.
And it was probably the best EPUB reader available on Windows.

Particularly because of the text-to-speech engine features.

Hmm, the disadvantage of your approach is that it unconditionally requires Javascript, even if the original didn't.

Also if you're going to embed a giant binary blob, please ship way to extract it.

Aren't the image blobs embedded in the URLs using Base64-encoded strings rather than using JS?
Yes, it's a trade-off.

Not a bad idea, thanks for the suggestion.

I write a fair amount of reports professionally and I use word.

Getting data from my Python analysis into the reports are tedious at best and updating numbers last minute is hair pulling frustrating.

But because of the good wysiwyg I can cheat on my adjustments when I need a graph to go “just there”, I can edit my paragraph wording such that I don’t get a almost completely blank page in between sections, etc, etc which is important to make a good looking report, imho.

How do you go about that with rst? I’d love to write a templates rst file that can be fed from my excel sheets and Python scripts, but how do I go about final layout adjustments?

I've gone a few routes. I have used sphinx's singlehtml builder to make a huge HTML file and then used pandoc to convert it into docx for final adjustments. This worked surprisingly well on a 2000-page document. But it's a bit cludgy.

Another (non-Sphinx) thing you can do is just write (portions of) your docx reports directly from Python using python-docx [1]. I use this approach when people give me strict docx templates that need to be filled in from Python in a very specific way. It can drop data-generated tables in at special placeholder sections and everything.

[1] https://python-docx.readthedocs.io/en/latest/

I will say that I've been more and more happy with just using sphinx straight to pdf for very professional looking reports. Given some latex preamble work in the config you can get it looking quite nice. I haven't personally struggled recently with too many egregious formatting issues on the sphinx-built latex stuff. You do have to swap over to landscape mode for large tables, etc. so it takes some work. But you're right that in many cases, formatting issues do still happen, so YMMV.

Another neat trick in sphinx is the csv-table directive [2], which loads table data directly from a csv file you have around, which you can obviously get from your xlsx.

[2] https://docutils.sourceforge.io/docs/ref/rst/directives.html...

I do something similar for my reports. I write most of it in markdown using Typora and then I export the last draft to docx for fine tuning and distribution (the agencies I work with want docx submissions, not pdf, which always bothers me).

Typora uses pandoc to do the conversion. My reports are mainly text, charts, and lots of math formulae and it works great. You don't get fine adjustment of layout, but I find that a feature not a bug. I see so many people waste time to put a figure in just the right place. It doesn't matter. The goal is clear information transfer so just get the figure in the doc where it makes sense and go on.

There's a lot you can do with latex to automatically import data and update automatically from external sources, and while it might seem counter-intuitive it is much easier and less effort than Word's wysiwyg interface.
I'm jealous of how easy it is to import data when using a structured source code like format such as rst, markdown or latex. I'm sticking with word because I can easily do small layout adjustments like decreasing the margins of a table to make it fit on a page, or easily see when a paragraph is 1 or 2 words too long, causing it to shift all sorts of elements across pages.
You can do that with Latex as well? I use TexStudio which has a preview pane. Any time I make changes I hit f5 and it updates pretty quickly. It's not instantly but pretty close to it, and there are already less problems with things shifting around because it manages that better than Word does, by design.
I've recently switched to Quarto[0] with RStudio desktop[1] as the editor. It's my preferred approach for all writing now:

1. Great markdown editor with both source and WYSIWYG views

2. Render to a wide range of formats including html, pdf, epub, docx

3. Generate books, web sites, single page docs, presentations

4. Incorporate code (like jupyter) except the source is plain text with fenced blocks

5. Supports code in a number of languages including Python and R.

6. Can use other editors too (iirc there's a plugin for VS Code though never tried it).

7. Built in support for MathJax for mathematical formulae and Mermaid for text-based diagramming with auto inline preview

I prefer it to Word for writing and jupyter for notebooks. No affiliation to Posit, the company that develops both Quarto & RStudio. Just a fan of the products.

--

[0]: https://quarto.org/ [1]: https://posit.co/download/rstudio-desktop/

Try out Typst.

It senses changes to any file and auto-updates the doc lightning fast - it's far better than LaTeX IMO

No HTML export yet. Which this post is about.

Though I too like typst and am subscribed to their Github issue for HTML export, that maybe some day will be available.

I guess latex is still unbeatable for writing complex math expressions. These days, when I don't need that, I'm happy with AsciiDoc.
Sphinx/reStructuredText supports math in LaTeX input format [1], so you can still go nuts with complex math expressions while still benefitting from the relative simplicity.

[1] https://www.sphinx-doc.org/en/master/usage/restructuredtext/...

Looks like AsciiDoc supports similar latex math blocks [2]. Are there reasons you can't stick with that when doing math?

[2] https://docs.asciidoctor.org/asciidoc/latest/stem/#block

Sphinx supports ReStructuredText and Markdown.

MyST-Markdown supports MathJaX and Sphinx roles and directives. https://myst-parser.readthedocs.io/en/latest/

jupyter-book supports ReStructuredText, Jupyter Notebooks, and MyST-Markdown documents:

You can build Sphinx and Jupyter-Book projects with the ReadTheDocs container, which already has LaTeX installed: https://github.com/executablebooks/jupyter-book/issues/991

myst-templates/plain_latex_book: https://github.com/myst-templates/plain_latex_book/blob/main...

GitHub supports AsciiDoc in repos and maybe also wikis?

Is there a way to execute code in code blocks in AsciiDoc, and include the output?

latex2sympy requires ANTLR.

For example: writing complicated expression invovling calculus/matrix. That's not something I need everyday, though.
I have documented at least 10 x 10 matrices with rst math directives and found it to be pretty convenient. I don't understand what the benefit of pure latex is in this context.
pandas.DataFrame().to_latex() [1] and tabulate [2] support latex table output.

[1] https://pandas.pydata.org/docs/reference/api/pandas.DataFram...

[2] https://github.com/astanin/python-tabulate/blob/master/tabul...

Asciidoc supports math blocks, and there's an extension to render them at compile time
Typst.

Typst is better IMO

As a certified grumpy old developer I spent years writing off the "X but in Rust" projects, but I have to confess that a lot of good things with meaningful improvements have come from the rewrite-everything-in-Rust movement.

I've not used Typst and not authored much LaTeX (but worked on a project with a group of scientists who used nothing but LaTeX) and can see obvious advantages to Typst. Same with many, many other Rust libraries.

I think that typically a rewrite in, well anything, can be helpful - simply because the first write wasn't sure of what may work or what the correct model for the system should be, or how to handle specific parts of the system etc.

A rewrite in Rust can be good for those reasons, as it removes the "cruft" of old implementation, but also gets the nice properties of speed and such.

But ultimately the thing I love most about Rust is not even the safety and such - it's the package management and build system. Just look at the horrible python/js scene for how bad packaging and build systems can be, and you'll understand why that basic uniform experience can be so nice.

So funny to me that people assume, oh it's written in Rust, so it must be a rewrite of something else just so they can use Rust.

They never imagine that people choose Rust for something they want to implement anyway and not just to replicate something existing, that they do not want to use since it's not implemented in Rust????

Oh I know there's loads of original Rust work, but you have to acknowledge that the "X, but in Rust" trope exists.
Yep even as a big fan of it...it's definitely a trope. And one that's very easy to either dismiss or make fun of. It would be a bit strange for fans to feel defensiveness or denial over that.
jamiedumont let out a rambonctious laugh to himself.

- Ah, you got me good you meddling kids!

jamiedumont was talking to himself again.

hackerbod slowly leaned over and squinted at the screen.

- Uh Typst?

- Yeah! It’s a typesetting markup language. It’s supposed to be better than things like latex.

- Ok. What’s so funny about it?

- Oh hehe, it’s written in—guess what?

- I dunno?

- Rust!

jamiedumont started giggling but hackerbod remained neutrally unamused.

- Oh come on! Rewrite in Rust? Language zealots? Young adults who can’t program without some Ruby syntax sprinkled in?

- So this “typt” thing—

- Typst.

- Right, Typst, this typesetting thing was created to promote Rust in some way?

- Oh I don’t think so.

- It doesn’t mention Rust on the homepage or something? You know, Written in Rust?

- Nope. Not to my recollection.

- So is it a rewrite of something else in—

- Nope.

- So then what does that have to do with—

- Ah, but you’re missing the bigger picture, hackerbod.

- Ok.

- Year after year of this eye-rolling promotion and nagging, blah blah blah memory unsafety is bad, blah blah this is why we used angle brackets for generics, and these sly bastards went and pulled off the most epic Trojan Horse that I’ve ever seen been—

- And what’s that?

- They made an actually useful language!

hackerbod had to scoot back as jamiedumont fell off his swivel chair because he was laughing so hard. hackerbod scratched his head.

jamiedumont finally recovered from the ab-induced euphoria.

- Ah hackerbod, I hate to admit it but they got me good! Those cursed language zealots got one over on me!

I...I don't know what to make of this!
I wish Textile had won instead of plain Markdown. What are the benefits of Typst over the ConTeXt family?
No HTML export yet. Which this post is about.
I agree! I've been also using this as a personal website (for academia). This works like a charm. It's easy to render any equation, and it's fast (because not bloated).
Sphinx/rst are a nice middle ground between the simplicity of markdown and complexity of LaTeX. I used it to generate a lot of html docs for test reports. I did try pdf gen using via LaTeX and pdflatex for a bit, but stopped after the pdf was breaking the multiple thousands of pages.

And it's really tweakable, especially with html output where you can provide your own templates, or add in your own CSS/scripts even manual tags.

Providing my own templates is kind of a weird feature, because that's not really what I want (in the sense "people don't want to buy drills, they want to buy holes") - obviously that's a necessary feature, but I never ever want to make my own template, what I want instead is to have a template that does exactly what I need but that's made and maintained by someone else.

E.g. I don't care about a configurable formatting for bibliography, but I would want a pre-made template that implements the APA bibliography guidelines with all the tiny nuances correctly. I don't want to configure margins for columns, I want a template that does the IEEE formatting standard exactly. (95% compatibility doesn't work, if a single missing feature means the tool can't produce the required document because it's wrong at one spot on page 3, then I'd need to abandon the tool and pick something that works). And crucially, I want the separation between content and formatting so that I can easily take a blob of content that was formatted for one layout and just copy it in a completely different template and have it match the new formatting guidelines, e.g. automatically moving all the image captions to the other side, changing how they're numbered and referenced, etc.

Latex has all this baggage solved, almost everyone who wants a specific format from me will provide a Latex template with their weird typesetting fetishes included, and I just need to provide the content - while any upcoming tool has an uphill battle to become compatible and provide the same things, at the very least pre-made (and well made) templates for all the major formats (each discipline of science generally uses something different).

I forced myself to use it recently, I mostly found it to be both limited (cannot have part of a link in bold or italics) and inconvenient (each line of inline code must be indented).
It does have some limits, for sure. I havent tried bolding a portion of a url before.

I have enjoyed including inline code using the literal-include directive, which allows you to just include sections of code directly from a file in disk. This is great because you can cover your example code with unit tests while also talking about it in docs without replication. You can even use little border comments to mark snippet sections so that it's not sensitive to specific line numbers.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/...

I simply settled for Texinfo. It has great features exactly for tech documentation.