| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by unlinkr 2477 days ago
	Here is the answer to the question: https://www.cargocultcode.com/solving-the-zalgo-regex/ tl;dr: It can indeed be solved relatively easily with a regex.

2 comments

kortex 2477 days ago

This is a bit out of my wheelhouse, but this feels wrong, or at least naively capable. Like it feels like this sort of reasoning leads to the kind of bugs (depending on what you use the result of the rexex for) that allow for code injection, a la the Equifax hack.

Maybe another HN poster can back me up, or explain why in fact Zalgo is mistaken and CargoCode is correct.

Either way, this sort of complexity is one reason I avoid XML like the plague and keep HTML at arm's length.

link

lonelappde 2477 days ago

CargoCode is correct. Zalgo simply misread the question because he was so sick of similar subtly different questions.

link

unlinkr 2476 days ago

This is what I really hate about the Zalgo answer. It is instilling people some vague sense that regular expressions are somehow bad, wrong and dangerous. But without any real arguments or contexts which would allow you to evaluate if the feeling is justified.

link

lol768 2477 days ago

It doesn't work for me with regex101. "The preceding token is not quantifiable" on this part:

  | < (? \w+ )

link

kortex 2477 days ago

See, this is kinda what I mean. Maybe you can detect tags with regex, but maybe you shouldn't, given the widespread but subtle differences in regex engines.

Perhaps the entire approach of "why are you trying to parse X?" Needs to be traced and re-evaluated.

link

unlinkr 2477 days ago

> Maybe you can detect tags with regex, but maybe you shouldn't...

So what do you think would be a more appropriate choice for writing a tokenizer?

link

majewsky 2477 days ago

You want (?:, not (?

Without the colon, the parser appears to be interpreting (? as "one or more instances of (", but ( is no a full expression by itself and therefore cannot be modified with a quantifier.

link

unlinkr 2476 days ago

I actually meant (?<tag> in order to create a named capture.

link

unlinkr 2476 days ago

It was supposed to be (?<tag> \w+ ) in order to create a named capture. The <tag> was apparently lost in editing. Thanks for the heads-up.

link