Hacker News new | ask | show | jobs
by zahlman 485 days ago
I don't think anyone ever really expected to see widespread use of regexes to alter the structure of a Markdown document. Honestly, while something like "look for numbers and surround them with double-asterisks to put them in boldface" is feasible enough (and might even work!), I can't imagine that a lot of people would do that sort of thing very often (or want to) anyway.

If a document is supposed to have structure - even something as simple as nested lists of paragraphs - it doesn't seem realistic to expect regular text manipulation tools to do a whole lot with them. Something like "remove the second paragraph of the third entry in the fourth bullet-point list" is well beyond any sane use of any regex dialect that might be powerful enough. (Keeping in mind that traditional regexes can't balance brackets; presumably they can't properly track indentation levels either.)

See also: TOML - generally quite human-editable, but still very much structured with potentially arbitrary nesting.

2 comments

> (Keeping in mind that traditional regexes can't balance brackets; presumably they can't properly track indentation levels either.)

You're right: Regular expressions are equivalent to finite state machines[1], which lack the infinite memory needed to handle arbitrarily nested structures [2]. If there is a depth limit, however, it is possible (but painful) to craft a regex to describe the situation. For example, suppose you have a language where angle brackets serve as grouping symbols, like parentheses usually do elsewhere [3]. Ignoring other characters, you could verify balanced brackets up to one nesting level with

  /^(<>)*$/
and two levels with

  /^(<(<[^<>]*>|[^<>])*>)*$/
Don't do this when you have better options.

---

[1] https://reindeereffect.github.io/2018/06/24/index.html

[2] As do any machines I can afford, but my money can buy a pretty good illusion.

[3] < and > are not among the typical regex metacharacters, so they make for an easier discussion.

I think that prospect (of programmatically structurally editing markdown files) would have made everyone burst out in laughter in 2000; if you want to programmatically alter stuff, put it into sexp's or some other syntax. SGML. Apparently human readable but really a tricky format leads to this sort of thing: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...