|
|
|
|
|
by kortex
2474 days ago
|
|
I found it extremely helpful. The sheer emphasis of the reply made me very curious why the idea of using regex on xml is so bonkers. - I will never forget that regex can't parse XHTML - the reason being, regex is insufficiently powerful - when I first saw this post, I knew little about regex under the hood, this sent me down a wiki hole of FSMs, pushdown automata and turing machines - this misconception is apparently common enough to be madness-inducing to those that know better - use a hecking xml parser instead It almost reminds me of a Bill Nye sketch. Teaching through a bit of non-sequitur and absurdism. |
|
Furthermore, the asker specifically needs to distinguish between start tags and self-closing start tags. This is a token-level difference which is typically not exposed by XHTML parsers. So saying "use a parser" is less than helpful.
I have elaborated a bit in blog post: https://www.cargocultcode.com/solving-the-zalgo-regex/