Hacker News new | ask | show | jobs
by mdaniel 636 days ago
> pretty sure that's what BS uses under the hood?

it's an option[1], and my strong advice is to not use lxml for html since html5lib[2] has the explicitly stated goal of being WHATWG compliant: https://github.com/html5lib/html5lib-python#html5lib

1: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#insta...

2: https://pypi.org/project/html5lib/

1 comments

That's good to know, will try it out. I haven't had many cases of "broken" html in projects where I use lxml but when they do happen it can definitely be a pain.