Hacker News new | ask | show | jobs
by nomilk 406 days ago
That's awesome. Thanks for sharing.

First time hearing of the fetch() approach! If I understand correctly, regular browser automation might typically involve making separate GET requests for each page. Whereas the fetch() strategy involves making a GET for the first page (just as with regular browser automation), then after satisfying cloudflare, rather than going on to the next GET request, use fetch(<url>) to retrieve the rest of the pages you're after.

This approach is less noisy/impact on the server and therefore less likely to get noticed by bot detection.

This is fascinating stuff. (I'd previously used very little javascript in scrapes, preferring ruby, R, or python but this may tilt my tooling preferences toward using more js)

2 comments

Almost. I mean it's not like fetch(..) is going to lead to some esoteric kind of HTTP request method. I am guessing parent comment is saying what it is saying because fetch will utilize the cookies and other crumbs set by the successful completion of the captcha. If you can take all those crumbs and include it in your next GET request, you don't need to resort to utilizing fetch.
Scammers will use fingerprints from their victims browser/IP/geolocation to try and impersonate them, you basically can buy not only stolen credentials but also the environment in which to run them -safely- from such vendors
first time hearing about fetch too. but i don't see the advantage. is fetch reusing the connection and a manual page load not?