|
|
|
|
|
by henrybaxter
5533 days ago
|
|
You can get the best of all worlds imo by using lxml, which supports the selectors you want, uses Python which I prefer, and in my experience lxml is more robust than BeautifulSoup. I spent more than a year writing hundreds of scrapers that ran for weeks at a time. BeautifulSoup did not work out as well as lxml in practice. On extremely javascript heavy pages we used pyv8 actually. edit: more information at http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciat... the comments are useful too. |
|