Hacker News new | ask | show | jobs
by donavanm 3371 days ago
A few questions. Have you looked at doing batch token assignment or "leases" from the centralized source to the workers? Ie request 10 tokens from the central "bucket" when the local "bucket" reaches zero. You can dramatically reduce the number of remote operations, reducing syncronization latency and cost. Scaling the batch size based on past request rate is a nice further optimization.

Similarly Im not sure about your "atomicity requirements." Is percieved accuracy driving the centralized store and single token semantics? Time skew alone is going to drive 0.1-1% innaccuracy in your rate limiters unless youre in crazy town doing 1pps or ptp everywhere.

Some food for thought is consolidating "rate" and "concurrent" limitations. Youll frequently see the same dichotomy in pps and bps based limits. If you can derive a consistent unit of work you can use a single bucket. Ie 10 requests that take 1 cpu second each have the same "cost" as 1 request at 10 cpu seconds. In the macro queue theory has them as the same impact to the system.

You "fleet usage" case, and a few other comments, reads like a simulacrum of weighted queues. Expanding it to weighted priority queues is a really powerful abstraction. Imagine requests which are over their base rate limit are not discarded, but actually marked as the lowest priority. That allows you to do generalized strategies like rejecting requests (load shedding) based on queue depth (latency) and FIFO/FILO.

PS: everyone reading _please_ respond with a 400 series if the request fails due to client request properties like rate. It really really really sucks to try and guess if 503 means "I" was rate limited and should back off, or "you" failed and the request should be immediately retried.

PPS A tactic to consider is returning the estimated recharge time when a request is throttled. A good client can use that to intelligently back of and achieve homeostasis with your server resources.

1 comments

Too late to edit, but thought I should clarify. Nice introductory post and demo code for the concept. I wrote the parent in a hurry after work. I meant it as a mix of earnest questions and comments that came to me as I read through. Apologies if it came across as unnecessarily challenging.