|
|
|
|
|
by a1369209993
1341 days ago
|
|
> but this just gave them more errors to ignore and retry. So null-route the offending IPs on a [0]24-hour timeout? The problem you're describing isn't "scraping", it's "low-grade denial-of-service attack (that you suspect might be a result of attempted scraping)", and should be addressed accordingly. (The parenthesised part doesn't really matter.) 0: exponentially increasing up to -, for automated versions, but you're presumably already familiar with the current batch of offending source addresses. |
|
Also, double check that your first-stage throttling actually increases the latency of the requests, such that a user-agent that doesn't issue multiple requests concurrently (but starts a new request immediately on recieving a response) will automatically self-rate-limit. This should be standard for any 'serious' HTTP server, but I've seen a few that incorrectly go straight from "serve 200 OK instantly" to "serve 429 Too Many Requests, also instantly" rather than "serve 200 OK after ~1 second", and sending 429 only when there are actually too many requests (in particular, more than one at any given time).