|
|
|
|
|
by Nimsical
3382 days ago
|
|
I guess it's not truly distributed in that sense. StdLib uses AWS Lambda, which have widely different IPs and I believe they're multi-region. I haven't had issues hitting a wall with getting caught doing any scraping. But then again, I haven't done it at a 10k/pages/sec rate or anything like that. |
|
I have wanted to implement something like this one - ie: Lambda doing the downloading of the page itself.
I have wondered how it would work with very strict sites like Yelp - limits similar to what you would get in their API (so doesn't make sense not to use their API).
What are your stats like if you don't mind? How much people are using it and much are getting blocked (404 or 500 after 1000 requests, etc.)?
Edit: Is it possible to use my own credentials for AWS?