Hacker News new | ask | show | jobs
by whoisjuan 1409 days ago
It depends on what you are trying to accomplish, but I think a combination of Puppeteer and JSDOM or Cheerio should take you far. Where it gets complex is when you need to do things such as rotating IPs, but in my experience, that's only needed if you're engaging in a heavy scraping workload.

Puppeteer + JSDOM is what I used to build https://www.getscrape.com, which is a high-level web scraping API. Basically, you tell the API if you want links, images, texts, headings, numbers, etc; and the API gets all that stuff for you without the need to pass selectors or parsing instructions.

In case anyone here wants something straightforward. It works well to build generic scraping operations.