| HN Mirror

[Too late to edit:]

Also, double check that your first-stage throttling actually increases the latency of the requests, such that a user-agent that doesn't issue multiple requests concurrently (but starts a new request immediately on recieving a response) will automatically self-rate-limit. This should be standard for any 'serious' HTTP server, but I've seen a few that incorrectly go straight from "serve 200 OK instantly" to "serve 429 Too Many Requests, also instantly" rather than "serve 200 OK after ~1 second", and sending 429 only when there are actually too many requests (in particular, more than one at any given time).