Hacker News new | ask | show | jobs
by epaulson 988 days ago
It is kinda bonkers that after 30 years Linux still doesn't some kind of inescapable and recursive process group mechanism. How many tens of thousands of hours of programming time would have been saved over the years if there was a way to: 1. Your process can create a new process group ID (and write it out to stable storage if it wants) 2. add an argument to fork so the same process can pass that ID into fork and the child of the fork wakes up permanently in that group, and any child processes later created by that child will also be in that group 3. Any new groups created by any process in the group are always contained in the original group, you can never leave a group but you might be in multiple groups. 4. There's a way to kill an entire group and verify that everything for an identifier is dead.

I think Solaris and some of the BSDs had something like this, so I wish Linux would add one too (though, I guess Linux has managed without one for so long that maybe that's proof enough that it's not really needed, and worst case you can always just reboot the box - nuking from orbit is the only way to be sure)

8 comments

It does? Cgroups. The problem is that Posix doesn't have them, so it's not portable.
Cgroups have several problems

1. It is relatively complicated to use. And even harder to use properly. From what I understand, to reliably kill all processes you need to freeze the cgroup then list the pids in it, then send a signal to each of those pids. Which is pretty involved, requires a separate supervisor process, and isn't 100% reliable in cgroupv1.

2. It requires root, or at least having control of a cgroup delegated to the process. You might be able to use user namespaces, depending on the distro and kernel, but that makes the implementation even more complicated.

3. It is possible to escape the cgroup, if the child process has permission to write to the task file of another cgroup.

Cgroups are useful, and can be used for this use case in some common scenarios, such as docker and systemd.

But as a general tool for structured concurrency that normal processes can use, it doesn't quite fit the bill.

its sort of weird to think that the notion that portability with respect to cgroups is now primarily about whether your version of the linux kernel supports the revision of cgroups that you're concerned about, rather than that you have a kernel that understands cgroups.
How many versions of cgroups APIs are there?
Two. And what's funny, I saw some of the fallout from the transition from v1 to v2 only last year at my last gig. Company upgraded Debian and Debian maintainers opted to go v2 only in the kernel build. However, the version of JVM the company was using did not fully support v2. Without the cgroup support, each JVM in a container thought they had the resources of the whole system. It was a cluster (heh) fuck, as in that all of the affected services OOM'd thinking they had more heap potential and the k8s cluster was churning pods like no tomorrow.
PR_SET_CHILD_SUBREAPER was added in linux 3.4 (2012) and does your points 1..3, you got to kill stuff manually.

Workes well, needs bo superuser or special filesystems, always composable. Not sure why people don't use it much.

It requires a bit of annoying machinery since you have to dedicate a process to being a subreaper. Still it's probably the best option, I made https://github.com/catern/supervise with it, see also https://catern.com/process.html
No need for nuking from orbit. Boiling the ocean (i.e. spawning a VM) is also an option.
What about PID namespaces? (You can use those without using a full-on container solution like docker or podman)
Requires root, or use of user namespaces as well. You also need a process that can act as pid 1, with the special semantics that involves.
Linux absolutely has what you ask for, it's just not really used.

Check the output of `ps -efj` PGID's column.

PGID can be and is often set at will, and this escapes a process out of its process group hierarchy.
Aren't you talking about cgroups? They "allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored." `man cgroups` and `man unshare`.
Or even just have a way for a process to tell the OS that when it dies, all descendant processes should be sent a signal (including SIGKILL), with no way for them to opt out of it.
99% of the time when people say "I wish Linux had ...", the problem is that they're not using systemd.

It's possible to do it without systemd of course, but that involves an ad hoc, informally-specified, bug-ridden, slow implementation of half of systemd anyway.