|
|
|
|
|
by Joker_vD
285 days ago
|
|
> Docker’s lack of init by default If anything, it's the problem with the design of the UNIX's process management, inherited thoughtlessly, which Docker decided to not deal with on its own. Why does there have to be a whole special, unkillable process whose only job is to call wait(2) in an infinite loop? Because in the original UNIX design, Ken Thompson apparently did not want to do too much work in the kernel during exit(2): if process A calls exit(2) while having 20 already exited children it didn't wait for, you either have to reap those 20 processes (which involves reading their PCBs from the swap on disk), and then potentially reap their already exited children, and their grandchildren... or you can just iterate over the process table and set the ppid of A's children to 1 and schedule PID 1 to run and let it deal with reaping one process at a time in wait(2). Essentially, the work is pushed to the scheduler, but the logic itself lives in the user space at the cost of PID space pollution. |
|
The funny thing is that there is a way to opt out of zombie reaping as pid1 or a subreaper -- set sigaction of SIGCHLD to SIG_IGN (and so it really isn't that hard on the kernel side). Unfortunately this opts you out of all child death events, which means process managers can't use it.
If you want to argue that interaction of zombies with re-parenting is borked, that is a very different discussion (though re-parenting itself is necessary for daemonisation). If there was a way to only opt-out of zombie reaping for reparented processes things would be much nicer.
IMHO the bigger issue with Docker and pid1 is that pid1 signal semantics (for instance, most signals are effectively SIG_IGN by default) are different than other processes and lots of programs didn't deal with that properly back then. Nowadays it might be a bit better, it Docker has also had a built-in minimal init for many years (just use --init) so the problem is basically solved these days.