|
|
|
|
|
by chrisohara
5538 days ago
|
|
Node.JS seemed like a perfect fit for a few reason: 1. JS selectors make scraping _very_ easy. 2. Asynchronous is fast as it is, but the page is actually parsed as it's received - contrast this with other scraping solutions where you need to download a page and parse it once it's complete. 3. With asynchronous scraping it's trivial to handle failures, timeouts, retries, nested requests, recursing similar URLs, concurrent requests, etc. - just add one of the many options (https://github.com/chriso/node.io/wiki/API---Job-Options) |
|