Hacker News new | ask | show | jobs
by unlinkr 2474 days ago
I think there are a lot of knee-jerk answer because people see "XHTML" and "regex" in the same sentence and immediately think "not possible".

But the actual question is clearly not about matching start tags to end tags or building DOM or anything like that - which indeed would require a stack. The question is about recognizing start and end tags. You can do that perfectly fine with regular expressions - indeed many parsers uses regular expressions to tokenize the input before parsing.

Furthermore, the question specifically needs to recognize the difference between start-tags and self-closing tags. A differece which is not exposed by most XHTML parsers a far as I am aware

1 comments

Sorry, I misread. Indeed, actually tokenizing text is accomplished with regular expressions (although some parsers don’t need a tokenization pass, but details :).