| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mjn 4003 days ago
	The reason I think it's useful to note, even though you would not in practice parse it that way, is that in my opinion it better points to why parsing HTML with regexes is a bad idea. It would be possible to do so with a giant regex, but there are two problems: 1) manually writing regexes to parse nested data structures is bug-prone; and 2) finite-automaton size scales very badly with the choice of nesting limit, which is more of a complexity rather than computability issue. Say you had a recursive data structure with a very small maximum nesting depth, like 5. Should you use a regex then? I would argue still no: there's no computational problem, but writing a correct regex to do so is still bug-prone. At least, writing one manually is. In the case of small finite nesting depths there might occasionally be reasons to mechanically compile something that looks more like EBNF to a DFA or NFA. But something else might well be better. At that point it's just an efficiency question.