Hacker News new | ask | show | jobs
by lazerwalker 5204 days ago
In my experience, using jsdom (and other similar node.js DOM libraries) is fine for scraping static content, but tends to fall down when you're dealing with anything that requires executing client-side JS. That's a big deal if you're scraping sites that load in content via XHR, or manipulate CSRF tokens in JS specifically to throw off static scrapers. Both of these are use cases that PhantomJS has handled beautifully for me in the past.