Hacker News new | ask | show | jobs
by chrismorgan 1458 days ago
Markdown is just generally awful because it’s not designed to be extensible. And so people make a total hash of things like this when trying to add custom inline syntax (because it’s not possible to do it compatibly), and abuse preformatted code blocks to do something other than show code. (Seriously, if you make ```mermaid … ``` turn it into a diagram, how am I supposed to show syntax-highlighted Mermaid code? Or ```math … ```, same deal. In this regard, I actually prefer the $$ … $$ GitHub have used, for all its problems.) Alternatives like reStructuredText and AsciiDoc are just worlds ahead in sanity. (Markdown’s HTML foundations don’t help, either.)

Markdown is a complete dead end.

I’ve been making a lightweight markup language of my own, and I thought long and hard about this kind of thing, with the goal of making something extremely consistent and easily parseable by human and machine alike. (All popular LMLs are surprisingly hard to parse correctly, so that text editors never have fully correct syntax highlighting unless they use something like LSP-backed highlighting with the real parser.) There’s a common problem with syntax extensions needing semantic understanding before you can actually parse their bodies. I’m using two different syntaxes for the bodies of what I’m calling macros (here shown without arguments to the macros, partly because I’m still not entirely satisfied with any of the syntaxes I’ve tried for them):

  @macro-name{interpreted-macro-body}
  @macro-name`raw-macro-body`
In the case of an interpreted macro body, it will be parsed fully (supporting both block and inline formatting) and fed to the macro so; in the case of a raw body, it will will be fed to the macro uninterpreted, just like with the `…` monospace code syntax. (If you wanted to pass a monospace code element as the body, that’d be @macro-name{`…`}. In the general context, the monospaced code syntax `…` is basically just shorthand for @code`…`, like **bold** can be shorthand for @bold{bold}.) This would lead to the shortest possible syntax for mathematics being @m`…`, which I think is acceptable, and much more syntactically robust. If you needed backticks inside the body, you’d currently have to use @m{…} syntax, backslash-escaping any special syntax, because I haven’t come up with any satisfactory other syntax (allowing delimiter repetition, like @m```…```, doesn’t solve all cases as you can’t use the delimiter at the start or end of the value, a problem that most LMLs that go this way seem to ignore, e.g. I think there are some things that you genuinely can’t express in reStructuredText because of this, and others have awful syntactic hacks like backslash space being special; I’m contemplating @m#`…`# and @m#{…}# with arbitrary but matching number of hashes, like Rust’s raw strings, but it’s still not as neat, so I could end up just leaving it at “use an interpreted body and escape everything”). All up, I think this raw/interpreted body distinction should work pretty well, and is sound.
4 comments

Markdown is a dead end, and that's why I'm using it. There's only so much syntax you can put into plain text and remain readable more or less as text, as opposed to code.

I, too, would prefer some other language for complex expressions, although that's to keep Markdown simple rather than to obtain power.

I never want to decypher latex in READMEs I read, unless it's a readme for a latex library.

I’ve seen all kinds of awful README.md files because people are trying to shoehorn stuff into Markdown that doesn’t fit.

The problem with Markdown is that it isn’t simple; rather, it’s simplistic, and quite a few things that should be simple instead require terrible hacks to work around Markdown and use HTML.

reStructuredText is generally quite a bit better in this regard because it focuses on expressing semantics.

An interesting take, thanks for the input! As a layman (pretty much), I had always a bit frowned upon reST since I never got used to syntax. That was probably because Markdown was already so popular when I started using it.

One little remark: You can still have highlighted math code blocks in gh's Markdown. The lang here is ```latex. Anyway, I see this might not be satisfactory.

reStrutucturedText is still useful to look at for inspiration here. It had the concepts of extensible metadata ("field lists"), spans ("interpreted text"), and blocks ("directives"). Including things like applying metadata to spans (using essentially Footnotes to provide field lists to interpreted text sections, like but better than Markdown's reference style for hyperlinks which almost no one uses but were much more common in rST).

I still sometimes wonder if reStructuredText had better acceptance outside of just the Python community if it might have had a better run for "default" versus Markdown's quirkier approach.

https://docutils.sourceforge.io/rst.html

I’m very deeply familiar with reStructuredText, having bent both it and Sphinx to my will in great detail a decade ago.

A decade ago, I thought that maybe if reStructuredText had had a non-Python implementation (most significantly at the time, PHP and JavaScript) it would have conquered instead of Markdown, given how clearly superior it is.

Now, I’ve decided that the fact that reStructuredText wasn’t ported is actually bound up in the reason why it didn’t conquer: it’s too fancy, too complex. Markdown is an awful kludge that was built on regular expressions, duct tape and HTML and knows it’s unsound. It’s very strongly web-native. reStructuredText is beautiful elegance and perfection of form, but aggressively medium-neutral, painfully so when you only care about the web, and difficult to implement, and unforgiving of content errors.

So I still wish reStructuredText had won, but I can understand more clearly now why it didn’t, and in fact never really stood a chance.

It may be worth looking at org-mode syntax. Yes, org-mode is part of Emacs, but the syntax of the file seems to meet some of your requirements, namely parseable by human and machine alike.