Hacker News new | ask | show | jobs
by chrisohara 5538 days ago
Node.JS seemed like a perfect fit for a few reason:

1. JS selectors make scraping _very_ easy.

2. Asynchronous is fast as it is, but the page is actually parsed as it's received - contrast this with other scraping solutions where you need to download a page and parse it once it's complete.

3. With asynchronous scraping it's trivial to handle failures, timeouts, retries, nested requests, recursing similar URLs, concurrent requests, etc. - just add one of the many options (https://github.com/chriso/node.io/wiki/API---Job-Options)