Hacker News new | ask | show | jobs
by wahern 563 days ago
AFAIU, the semantics of SO_REUSEPORT for TCP were consistent across the BSDs, at least since the time the Google folks started exploring this area. For TCP the behavior was effectively a LIFO/stack-like queue for listening sockets; incoming connections were enqueued with the most recent socket to bind. If that socket was closed, then the next most recent socket, if any, would be chosen. I don't know if this was originally deliberate, but this effectively permitted seamless, robust server restarts (automated or manual) without accidentally losing connections, without requiring an intermediate process (e.g. inetd or systemd), and without any complicated IPC, so long as the older process drained its listening queue before exiting.

Unfortunately, this behavior was never documented in the manual pages; only the load balancing-like UDP semantics were documented. Previously Linux didn't support SO_REUSEPORT at all, and unfortunately whomever decided to make use of SO_REUSEPORT on Linux either didn't check or didn't care what the actual behavior was for TCP connections on the BSDs. AFAIU, on the Linux side the original problem being solved was the infamous stampeding herd issue when multiple processes or threads were polling on listening TCP sockets, as can occur with non-blocking/asynchronous I/O frameworks that utilize multiple processes or threads, each with its own event loop polling (i.e. poll/epoll_wait/kqueue) for incoming connections. The semantics they chose were largely compatible with how the BSDs supported this for UDP, but not how it worked for TCP.

Arguably, at least regards TCP, SO_REUSEPORT wasn't added to Linux so much as an entirely different feature was created and the implementation decided to squat the pre-existing macro definition out of convenience. (SO_REUSEPORT had always been defined, but, IIRC, simply ignored by Linux, similar to SO_RCVLOWAT.) That's not charitable, but neither was the decision to either not investigate the actual BSD behavior, or if they did to ignore or discount it.