|
|
|
|
|
by sxp
2059 days ago
|
|
+1 to using cheerio.js. When I need to write a web scraper, I've used Node's `request` library to get the HTML text and cheerio to extract links and resources for the next stage. I've also used cheerio when I want to save a functioning local cache of a webpage since I can have it transform all the various multi-server references for <img>, <a>, <script>, etc on the page to locally valid URLs and then fetch those URLs. |
|
(I'm a maintainer of jsdom.)