Hacker News new | ask | show | jobs
by toast0 1765 days ago
That's really not the traditional meaning of thundering herd, which is about waking up all the processes when a connection comes in, then they all try to accept it and it's a lot of work for nothing. You get much better results if only a single process is woken up for each event.

Your problem is a real problem though. Where I worked, we would call that backlog, and we would manage it with 'floodgates' ... When the system is broken, close the gates, and you need to open them slowly.

In an ideal world, your system would self-regulate from dead to live, shedding load as necessary, but always making headway. But sometimes a little help is needed to avoid the feedback loop of timed out client requests that still get processed on the server keeping the server in overload.