|
|
|
|
|
by nomilk
406 days ago
|
|
That's awesome. Thanks for sharing. First time hearing of the fetch() approach! If I understand correctly, regular browser automation might typically involve making separate GET requests for each page. Whereas the fetch() strategy involves making a GET for the first page (just as with regular browser automation), then after satisfying cloudflare, rather than going on to the next GET request, use fetch(<url>) to retrieve the rest of the pages you're after. This approach is less noisy/impact on the server and therefore less likely to get noticed by bot detection. This is fascinating stuff. (I'd previously used very little javascript in scrapes, preferring ruby, R, or python but this may tilt my tooling preferences toward using more js) |
|