Hacker News new | ask | show | jobs
by drchickensalad 3357 days ago
You have to send all the traffic from one client to one server then right? Seems not without heavy drawbacks.

Otherwise you can easily get them to hit their limit super early with bad luck.

1 comments

No, you evenly round robin all traffic to all servers.

Each server contains a map of tokens per per client filling at a fixed interval.

That interval is calculated by taking the total global token refresh rate and dividing it by the number of servers.

The end result is exactly the same but, now you are stateless and have eliminated the bottleneck of a central token bucket.

Wait, each client does its own round robin (if you have three servers, I will hit 1 then 2 then 3)? Is that common?
The client doesn't do it. You put your front ends behind a load balancer like an ELB, or use a reverse proxy like Nginx.

Edit: And yes, round robin is the most commonly used load distribution technique, and works very well assuming each request has a roughly equivalent unit of work cost.

I'm surprised you wouldn't run into cases where the requests being rate-limited can't end up unevenly distributed between servers. There are assumptions you could add that would make that not a problem, but I'm surprised they'd hold.