Hacker News new | ask | show | jobs
by dpflan 3139 days ago
Jeff Atwood has an entertaining post about parsing HTML with regular expressions:

https://blog.codinghorror.com/parsing-html-the-cthulhu-way/

“”” That's right, if you attempt to parse HTML with regular expressions, you're succumbing to the temptations of the dark god Cthulhu's … er … code. “””

2 comments

Let's not forget about this masterpiece: https://stackoverflow.com/a/1732454/864310
Indeed, its quality cannot be ignored and must be shared; it’s referenced in the Atwood post.
Parsing and scraping are different things though. You don't need to parse a web page to extract specific things from it.