|
|
|
|
|
by thraxil
1780 days ago
|
|
Retries without jitter are indeed a common source of thundering herd problems. Even with exponential backoff, if all the clients are retrying simultaneously, they'll hammer your servers over and over. Adding jitter (just a random amount of extra delay that's different for every client+retry), they get staggered and the requests are spread out. |
|
Imagine you’re a service like Feedly, and one of your “direct customer” API clients — some feed-reader mobile client — has coded their apps such that all of their connected clients will re-request the specific user’s unique feed at exact, crontab-like 5-minute offsets from the start of the hour. So every five minutes, you get a huge burst of traffic, from all these clients—and it’s all different traffic, with nothing coalescesable.
You don’t control the client in this case, but nor can you simply ban them—they’re your paying customers! (Yes, you can “fire your customer”, but this would be most of your customers…)
And certainly, you can try to teach the devs of your client how to write their own jitter logic—but that rarely works out, as often it’s junior frontend devs who wrote the client-side code, and it’s hard to have a non-intermediated conversation with them.