Y
Hacker News
new
|
ask
|
show
|
jobs
by
chewxy
4845 days ago
Only on wellformed pages. There are many many many many many malformed pages on the internet. Even those that are created in 2013
1 comments
ville
4845 days ago
Fortunately HTML5 defined a standard way to parse even broken HTML and that parser is implemented in html5lib package. You can use it also with lxml and even use "jQuery like" selectors with lxml.cssselect (
http://lxml.de/cssselect.html
)
link