|
|
|
|
|
by zahlman
485 days ago
|
|
I don't think anyone ever really expected to see widespread use of regexes to alter the structure of a Markdown document. Honestly, while something like "look for numbers and surround them with double-asterisks to put them in boldface" is feasible enough (and might even work!), I can't imagine that a lot of people would do that sort of thing very often (or want to) anyway. If a document is supposed to have structure - even something as simple as nested lists of paragraphs - it doesn't seem realistic to expect regular text manipulation tools to do a whole lot with them. Something like "remove the second paragraph of the third entry in the fourth bullet-point list" is well beyond any sane use of any regex dialect that might be powerful enough. (Keeping in mind that traditional regexes can't balance brackets; presumably they can't properly track indentation levels either.) See also: TOML - generally quite human-editable, but still very much structured with potentially arbitrary nesting. |
|
You're right: Regular expressions are equivalent to finite state machines[1], which lack the infinite memory needed to handle arbitrarily nested structures [2]. If there is a depth limit, however, it is possible (but painful) to craft a regex to describe the situation. For example, suppose you have a language where angle brackets serve as grouping symbols, like parentheses usually do elsewhere [3]. Ignoring other characters, you could verify balanced brackets up to one nesting level with
and two levels with Don't do this when you have better options.---
[1] https://reindeereffect.github.io/2018/06/24/index.html
[2] As do any machines I can afford, but my money can buy a pretty good illusion.
[3] < and > are not among the typical regex metacharacters, so they make for an easier discussion.