Hacker News new | ask | show | jobs
by alexginzburg 3458 days ago
It seems that the author is slightly confused. Description of the thundering herd is mostly correct. Initially it referred to multiple processes calling the accept() on a single listen socket, it used to cause all processes to wake up. This has been fixed a long time ago. Currently multiple processes blocking on the accept() on a single listening socket works in a round-robin fashion (AFAIK). Calling select() from multiple processes on a single fd, should wake all the processes up when IO comes in. This is a documented behavior.

--Edit--

s/write/IO/

2 comments

Yes, all this 'fundamentally broken' talk is overblown rubbish. I was working on servers handling large numbers of sockets across lots of processes over a decade ago. Scaling non-blocking I/O was a well known area a long time ago, and problems like the 'thundering herd' and others had multiple solutions across a variety of operating systems.
You are correct. Directly blocking on accept() in multiple processes does not have the "thundering herd" behaviour. This is good to know.

But this proves next point - select is a poor abstraction. This means that accept() is doing something more then just wait for readability (it attempts round robin) - a thing you can't express with select().

In the article I used the accept() case for illustration of the thundering herd problem. Non-blocking connect() taking a long time makes a good case. The same experiment could be done though measuring write() or sendmsg() syscalls.

First, the article starts talking about the accept() and thundering herd but the example shows use of the select().

Second, accept() goes over a queue of the established connections created by a listen() call

select() and accept() are meant for different things. selec/poll/epoll/kqueue work with a list of file descriptors to detect I/O, accept works with a socket.