Hacker News new | ask | show | jobs
by ankimal 4853 days ago
They make scraping as easy as finding the right jquery selectors (once you inject jQuery onto the page) but can be very slow as compared to a vanilla HTML only scraper.

In my experience, a phantom/casper implementation could take upwards of 5-10 secs. to process a single page (almost 5-10x slower). This, even if you disable load of remote images and plugins.

1 comments

There is a startup penalty to getting phantomjs executable up (including all of its WebKit internals), but once you're there, I've never had any performance issues. Roll a script using casper.each() and feed it an array of urls. It is typically very fast for me. You can trap on the page loaded event and do some benchmarking, but I would disagree with your premise that using PhantomJS/CasperJS is slow.