Hacker News new | ask | show | jobs
by chewxy 4845 days ago
Only on wellformed pages. There are many many many many many malformed pages on the internet. Even those that are created in 2013
1 comments

Fortunately HTML5 defined a standard way to parse even broken HTML and that parser is implemented in html5lib package. You can use it also with lxml and even use "jQuery like" selectors with lxml.cssselect (http://lxml.de/cssselect.html)