|
|
|
|
|
by arvinjoar
3177 days ago
|
|
Yes, a thousand times this. First of all, regex in the wild (e.g. Perl regex) is much more powerful than the CS version (that can only handle regular languages). This point is often conceded though from the "don't use regex to parse HTML" side, but some don't realize this. Another thing is that you don't really need to handle HTML at all, only a small subsection that might be totally fine with a regex, even a simple one, for a lot of cases. The true enemy is parsing something that might change over time, and that's totally unrelated to the regex issue. |
|
Recently I replaced this with a xml tokenizer I wrote in Go that can deal with invalid or corrupt xml. On top of this I have used a state machine to make it possible to handle different situations.