Hacker News new | ask | show | jobs
by thaumasiotes 1768 days ago
OK. But that's not a problem of a thundering herd. It's a problem that you have more incoming work than you are theoretically able to handle even if you stay in continuous operation. Your problem is solved by adding the capacity to do more work. The thundering herd is solved by purposefully desynchronizing incoming work requests.
1 comments

Oh, I agree it's not thundering herd, but it is a real problem. Especially if you start getting retries after the first requests timed out. Some sort of backoff with jitter to avoid synchronized retries helps, but what really helps is dropping or not accepting requests when the processing will not be timely. That's simple to say, but not always simple to do.

Adding capacity is also simple to say, but not always simple to do. And there can be a large difference between the capacity needed to handle a cold start at peak vs the capacity needed for peak under regular operations.