Hacker News new | ask | show | jobs
by kimagure 4843 days ago
PyQuery seems to always be faster in my experience than BS4 (for ripping the same information). Anyone else have a similar experience?
1 comments

Only on wellformed pages. There are many many many many many malformed pages on the internet. Even those that are created in 2013
Fortunately HTML5 defined a standard way to parse even broken HTML and that parser is implemented in html5lib package. You can use it also with lxml and even use "jQuery like" selectors with lxml.cssselect (http://lxml.de/cssselect.html)