Hacker News new | ask | show | jobs
by stingraycharles 1763 days ago
Tangent, but I always had a different understanding of the “thundering herd” problem; that is, if a service is down for whatever reason, and it’s brought back online, it immediately grinds to a halt again because there are a bazillion requests waiting to be handled.

And the solution to this problem is to slowly, rate-limited, bring the service back online, rather than letting the whole thundering herd go through the door immediately.

5 comments

That's really not the traditional meaning of thundering herd, which is about waking up all the processes when a connection comes in, then they all try to accept it and it's a lot of work for nothing. You get much better results if only a single process is woken up for each event.

Your problem is a real problem though. Where I worked, we would call that backlog, and we would manage it with 'floodgates' ... When the system is broken, close the gates, and you need to open them slowly.

In an ideal world, your system would self-regulate from dead to live, shedding load as necessary, but always making headway. But sometimes a little help is needed to avoid the feedback loop of timed out client requests that still get processed on the server keeping the server in overload.

Yea you are right. It could be a service being down and requests piling up, or a cache key expiring and many processes trying to regenerate the value at the same time, etc.

I think the article just used this phrase to describe something else. (Great article otherwise).

There is an explanation of this kind of thundering herd about 3/4 down this article https://httpd.apache.org/docs/trunk/misc/perf-scaling.html

The short version is that when you have multiple processes waiting on listening sockets and a connection arrives, they all get woken up and scheduled to run, but only one will pick up the connection, and the rest have to go back to sleep. These futile wakeups can be a huge waste of CPU, so on systems without accept() scalability fixes, or with more tricky server configurations, the web server puts a lock around accept() to ensure only one process is woken up at a time.

The term (and the fix) dates back to the performance improvement work on Apache 1.3 in the mid-1990s.

Phrase borrowed from excellent uWSGI docs https://uwsgi-docs.readthedocs.io/en/latest/articles/Seriali...
Funny reading this comment after reading the article

> So many options meant plenty of levers to twist around, but the lack of clear documentation meant that we were frequently left guessing the true intention of a given flag.

And then reading your link, they complain >inside the docs< that the docs aren't complete. I have no idea what to believe anymore :D

The uWSGI docs also say, in the section called "uWSGI developers are fu*!ing cowards": "why --thunder-lock is not the default when multiprocess + multithread is requested? This is a good question with a simple answer: we are cowards who only care about money."

Strange read.

That's not the thundering herd. If someone rings the door (request), only one person (agent, process) needs to answer the door. But what might happen is that everyone in the house rushes to answer the door. The people "thundering" to the door (and making a mess as they do so) are the "herd". This can quickly become a problem if there are a lot of people in the house and the doorbell keeps ringing.
> but I always had a different understanding of the “thundering herd” problem; that is, if a service is down for whatever reason, and it’s brought back online, it immediately grinds to a halt again because there are a bazillion requests waiting to be handled.

That... doesn't have much to do with the thundering herd problem. It also doesn't make much sense as a concept on its own merits -- say you come in to work and your inbox is full enough for three inboxes. Does that fact, in itself, mean that you decide you're done for the day? No, it just means you have a much longer queue to work through than usual.

The thundering herd problem refers to what happens when (1) a bunch of agents come to you for something while you're busy; (2) you tell them all "I'm busy, go away and come back later"; and (3) the come-back-later time you give to each of them is identical, so they all come back simultaneously.

And that's exactly what's happening here, except that instead of giving each worker thread a come-back-later time when it asks for work, you're receiving work, sending out individual messages to every worker saying "hey, I'm not busy anymore, come back RIGHT NOW and get some more work", and then rejecting all but one of the thundering herd that shows up. The reason the Gunicorn docs and the uWSGI docs both refer to this as a "thundering herd" problem is that it's a near-perfect match for the problem prototype. The only difference is that, instead of giving out identical come-back-later times to worker threads as they ask you for work, you tell them to wait for a notification that includes a come-back-later time, and then when you get one piece of work you fire off that notification separately to every sleeping thread, including identical come-back-later times in each one.

> That... doesn't have much to do with the thundering herd problem. It also doesn't make much sense as a concept on its own merits -- say you come in to work and your inbox is full enough for three inboxes. Does that fact, in itself, mean that you decide you're done for the day? No, it just means you have a much longer queue to work through than usual.

If my SLA is 24 hour response time, and the inbox is FIFO, and I can't drop old messages, I'm most likely not hitting the SLA. If they all came in overnight, I'll hit the SLA for day 1, but I will be busy all of day 2 and 3 and never respond on time. If after day 1, I get a days worth of messages every day, I'll never catch up.

OK. But that's not a problem of a thundering herd. It's a problem that you have more incoming work than you are theoretically able to handle even if you stay in continuous operation. Your problem is solved by adding the capacity to do more work. The thundering herd is solved by purposefully desynchronizing incoming work requests.
Oh, I agree it's not thundering herd, but it is a real problem. Especially if you start getting retries after the first requests timed out. Some sort of backoff with jitter to avoid synchronized retries helps, but what really helps is dropping or not accepting requests when the processing will not be timely. That's simple to say, but not always simple to do.

Adding capacity is also simple to say, but not always simple to do. And there can be a large difference between the capacity needed to handle a cold start at peak vs the capacity needed for peak under regular operations.

This reminds me of inrush current when starting large motors... You get a huge current spike when you initially turn on the motor, so large that it can trip the breaker.

One solution is to use a soft starter which slow brings the motor up to speed.