| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hnriot 4897 days ago
	He does, but sadly his comment is way off. Anyone that's done any amount of HTML scraping will use BeautifullSoup over lxml. The former being easier and more tolerant of html's nuances. The latter being brittle for anything less well formed than XHTML.

2 comments

mickeyp 4897 days ago

Sorry but lxml with ETree will handle any amount of broken html you throw at it. Add in XPath and I find lxml to be a far superior, and more memory efficent, option.

Source: former professional web scraper.

link

berlinbrown 4897 days ago

I didn't want to say it but yea I thought BeautifulSoup has way more development.

I wonder if you disagree with him, he will unleash his wraith upon ye.

link