Hacker News new | ask | show | jobs
by derefr 1776 days ago
I like the “jitter based on API key” idea.

It’s somewhat hard in our case, as our direct customers (like the mobile app I mentioned) have API keys with us, but they don’t tell us about which user of theirs is making the request. And often they’ll run an HTTP gateway (in part so that they don’t have to embed their API key for our service in their client app), so we don’t even get to see the originating user IPs for these requests, either. We just get these huge spikes of periodic traffic, all from the same IP, all with the same API key, all about different things, and all delivered over a bunch of independent, concurrent TCP connections.

I’ve been considering a few options:

- Require users that have such a “multiple users behind an API gateway” setup, to tag their proxied requests with per-user API sub-keys, so we can jitter/schedule based on those.

- Since these customers like API gateways so much, we could just build a better API gateway for them to run; one that benefits us. (E.g. by Nagle-ing requests together into fewer, larger batch requests.) Requests that come as a single large batch request, could be scheduled by our backend at an optimal concurrency level, rather than trying to deal with huge concurrency bursts as we are now.

- Force users to rewrite their software to “play nice”, by introducing heavy-handed rate-limiting. Try to tune it so that the only possible way to avoid 429s is to either do gateway-side request queuing, or to introduce per-client schedule offsets (i.e. placing users on a hash ring by their ID, so for a periodic-per-5-minutes request, equal numbers of client apps are set to make the request at T+0, vs. T+2.5.)

- Introduce a middleware / reverse-proxy that holds an unbounded-in-size no-expire request queue, with one queue per API key, where requests are popped fairly from each queue (or prioritized according to the plan the user is paying for). Ensure backends only select(1) requests out from the middleware’s downstream sockets as quickly as they’re able to handle them. Require API requests to have explicit TTLs — a time after which serving the request would no longer be useful. If a backend pops a request and finds that it’s past its TTL, it discards it, answering it with an immediate 504 error.