Hacker News new | ask | show | jobs
by mickeyp 4852 days ago
Sorry but lxml with ETree will handle any amount of broken html you throw at it. Add in XPath and I find lxml to be a far superior, and more memory efficent, option.

Source: former professional web scraper.