Hacker News new | ask | show | jobs
by dmn001 3155 days ago
That may be fine for javascript heavy websites for a site with a few pages, but for anything with more than say 1,000 pages it is much more efficient to scrape using requests with lxml. The requests can be made concurrently, are scalable and there is no browser overhead with page rendering.
1 comments

I've done a lot of scraping in my day, and I've found that lxml/requests is 2-3 OOM more resource efficient than a Selenium based browser. That JS/rendering engine is HEAVY!