Hacker News new | ask | show | jobs
by spsphulse 1961 days ago
Is there a SOTA library for common web scraping issues at scale( especially distributed over cluster of nodes) for Captcha detection, IP rotation, Rate throttling, Queue Management etc.?
1 comments

What's a "SOTA library" ?
A contextual guess: "'State of the art' library"

In other words: Is there a drop in library to solve all the big common issues people run into scraping websites in the wild?

At least, that's how I read it.

There is no "state of the art library" to build your own google. But "Rate throttling/limiting" can be done with Redis, rotating ip is still rate-limiting with Redis, Captcha Detection - You have to pay $$ I think.