| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by karangoeluw 4660 days ago
	Regex to parse HTMl is probably the single worst thing you can do.

1 comments

lloeki 4660 days ago

Crafting a wide purpose regex to parse whatever HTML comes in is bad.

Building a regex to extract relevant data from simple, fixed-form page data, bypassing tags irrelevant to the problem at hand is not.

link

untothebreach 4660 days ago

...until the HTML changes.

I haven't look at their parsing code, so I have no idea if it is any better than using a regex, but if the regex assumes too much, simply reordering the attributes in a tag (or something similar) could break a regex-based solution.

link