Hacker News new | ask | show | jobs
by hnriot 4850 days ago
He does, but sadly his comment is way off. Anyone that's done any amount of HTML scraping will use BeautifullSoup over lxml. The former being easier and more tolerant of html's nuances. The latter being brittle for anything less well formed than XHTML.
2 comments

Sorry but lxml with ETree will handle any amount of broken html you throw at it. Add in XPath and I find lxml to be a far superior, and more memory efficent, option.

Source: former professional web scraper.

I didn't want to say it but yea I thought BeautifulSoup has way more development.

I wonder if you disagree with him, he will unleash his wraith upon ye.