|
|
|
|
|
by BiteCode_dev
1129 days ago
|
|
So he is using a full blown parser, but some part of the tokenisation is done with regexes. I call BS. Also I'm pretty sure it will miss some nesting of "<", somewhere, in an attribute, cdata, js, etc, that is not a tag, but will confuse the parser. I used regexes to parse HTML, it works fine for quick and dirty scripts that need a small chunk of data for a limited sample of pages. Which I believe is the message he is trying to convey. But I'd rather keep the legend of the infamous SO post against parsing HTML because: - it will help the people that need it the most to avoid making mistakes - it's fun, and part of our culture. |
|