Hacker News new | ask | show | jobs
by takluyver 4750 days ago
BS4, which is still actively developed, got out of the parser game - it can now use lxml (fast) or html5lib (highly tolerant) to parse the HTML. It's kept the convenient interface to dig into the DOM, and it's kept the UnicodeDammit encoding detection system.