| HN Mirror

systemd still has a race condition when handling forking servers. There's no way to atomically send a signal to a cgroup, so what systemd does is read the PIDs in the cgroup and then iteratively send SIGKILL to each one. However, between reading the PID and sending the signal is the classic PID file race.

There is a way to atomically send a signal to a traditional process group, however. What I do for my daemons--at least, those which create subprocesses--is have the master become a new session and process group leader using setsid, open a master/slave PTY pair, and assign the new PTY as the controlling terminal. Child processes inherit the controlling terminal from the master, and if the master ever dies then _all_ children with the controlling terminal and process group will atomically get SIGHUP. As long as your subprocesses aren't actively trying to subvert you, it's bullet-proof behavior and more robust than any hack using cgroups.

There's still the issue of figuring out how to kill the master process. Ideally the master process never forks away from, e.g., systemd. (Not sure if systemd will try to kill it directly, first, before relying on its cgroups hack. Also, sometimes becoming a session leader requires forking if, e.g., the invoker already made you a process group leader.) But if the master must be independent, the best way is for the master to be super simple and just have it open a unix domain socket to take start/stop commands.

But let's presume we want a failsafe method in case the master has some sort of bug and we need to send it SIGKILL. (This is what I always assume, actually.) No matter what you do there'll always be a race. However, the least racy way to get the PID using a PID file is using POSIX fcntl locks. fcntl locks provide a way to query the PID of the process holding a lock (no reading or writing of files involved; just a kernel syscall). Importantly, if the process dies it no longer holds the lock and so querying the owner cannot return a stale PID. So when I use a traditional "PID file", I don't write the PID to the file, I just have the master lock it. There's still the race between querying the pid and sending a signal, but at least you're not leaving a loaded gun around (i.e. a PID file with a stale PID written to it).

This method is no worse than systemd and arguably better in some respects.

Oddly, I don't think systemd even bothers with process groups. That's a shame because it's really the only race-free to kill a bunch of related processes. systemd could provide the option to spawn a service within a process group and to send SIGKILL to the process group first before resorting to the cgroups hack to pick up any stragglers (i.e. those that intentionally left the process group). It could even provide the controlling terminal trick as an option. But it doesn't. AFAIK it just use the imperfect cgroups hack.