Hacker News new | ask | show | jobs
by karangoeluw 4660 days ago
Regex to parse HTMl is probably the single worst thing you can do.
1 comments

Crafting a wide purpose regex to parse whatever HTML comes in is bad.

Building a regex to extract relevant data from simple, fixed-form page data, bypassing tags irrelevant to the problem at hand is not.

...until the HTML changes.

I haven't look at their parsing code, so I have no idea if it is any better than using a regex, but if the regex assumes too much, simply reordering the attributes in a tag (or something similar) could break a regex-based solution.