|
|
|
|
|
by unlinkr
2474 days ago
|
|
The job (the question asked) is about recognizing start and end tags in XHTML. These are lexical tokens and therefore regular expressions are a perfectly fine tool for this. Indeed many parsers use regular expressions to perform the lexical analysis (https://en.wikipedia.org/wiki/Lexical_analysis) aka tokeniziation. Quoting from wikipedia: The specification of a programming language often includes a set of rules, the lexical grammar, which defines the lexical syntax. The lexical syntax is usually a regular language, with the grammar rules consisting of regular expressions; If you disagree that regular expressions are an appropriate tool for lexical analysis, can you articulate why? And what technique do you propose instead? |
|
as for the solution, one can get a premade grammar and query the ast