|
|
|
|
|
by minimaxir
3739 days ago
|
|
If the service has an API with a fair rate limit (Foursquare does at 5000 requests/hour), I believe that is ok, since that implies their architecture is built for massive data requests. On the other hand, bypassing those rate limits with proxies is definitely bad. If a website does not have an API (BuzzFeed), I take care to only collect data that I need. Not anything that would damage the business. (E.g entire articles). Consequently, I sanitize the data of such things if I decide to release the dataset. |
|