Hacker News new | ask | show | jobs
by chousuke 4164 days ago
Okay. So, init starts the process supervisor. Then, the process supervisor starts everything else.

Then all of a sudden, something goes wrong and the process supervisor crashes. Init then inherits all the children, and has no clue what's going on with them, so your system is hosed.

What benefit is there to having the init be separate from the process supervisor?

EDIT: let me just expand on this a bit... Monit and other process supervisors do more than just manage the lifecycle of a process; they can run various checks to ensure that a service actually works. Systemd's process supervision is limited to knowing the current state of the process (running, stopped by admin, failed to start, constantly crashing, or optionally: not responding to watchdog), so systemd in no way makes monitoring daemons redundant.

When the traditional sysvinit starts and stops processes, it actually has no clue what it's doing and thus init scripts need to rely on PID files and other hacks to provide basic functionality.

I would like to know why you think that the process that starts and stops processes should not be interested in whether the processes are actually running or not.

2 comments

> When the traditional sysvinit starts and stops processes, it actually has no clue what it's doing and thus init scripts need to rely on PID files and other hacks to provide basic functionality.

Tooting my own horn here, but I wrote a filesystem called runfs that specifically addresses this. A service writes a PID file to runfs, and runfs automatically removes it once the process dies.

Code: https://github.com/jcnelson/runfs

Now you have the process supervisor in your init. What happens if your supervisor crashes now?

So obviously that's not a good idea.

Additionaly it bears the problem that you can't upgrade your process supervisor without rebooting.

So what benefits does it actually provide?

Your second statement misses the point. Your process supervisor can be ultimately stable and never crash, but if it does crash, it's just as fatal as init crashing.

systemd PID 1 won't be rendered prone to crashing just because it contains more than trivial amounts of code. If that were the case, then surely the Linux kernel would be crashing every fifteen minutes, considering how much code it contains.

As far as I'm aware, the only component that can screw up systemd is dbus, and since the relevant parts are moving into the kernel, you won't just be able to hose your system by killing the dbus daemon accidentally.

I have yet to see an argument for process supervision functionality not existing in PID 1, besides simply stating that it must be so. Meanwhile, an init which is guaranteed to know whether the processes it starts (or stops!) are actually running is able to behave much more intelligently than scripts sending signals to PIDs that hopefully correspond to the correct process.

No, your process supervisor crashing won't be as fatal as init crashing, because init crashing is an instant kernel panic under Linux. Also, very few projects are as robust as the Linux kernel or have development practices that are as good, and systemd is unlikely to be one of them.
When the supervisor crashes, which should occur extremely rarely in any case, the state of which services are running is lost, and its child processes are re-parented to PID1, so when a new instance of the supervisor starts it cannot tell which services are running, and which of the running processes belong to which service. During the time the supervisor needs to re-start (presumably init would respawn it?), some of the running processes could exit without notice.

What is gained by restarting then? You'll likely want to reboot to get the system into a consistent state anyway.

Here is a war story of an embedded developer who actually created his own init system with separate supervisor process, and found that it doesn't actually make the system as a whole more robust:

https://lwn.net/Articles/623527/

For a good disucssion of the trade-offs involved see this comment by JdeBP:

https://news.ycombinator.com/item?id=8384251

Edited to add: Also, is your comment about bad development practices in the systemd project purely based on statistical conjecture (which would mean it applies to every single project, except of course the Linux kernel where presumably you have personally observed the absence of bad practices), or do you have anything to back that up?

Lovely exchange in those LWN comments about monolithic vs modular. In particularly how the proponents are (perhaps willfully) confusing public and private APIs...
You can restart systemd on the fly with `systemctl daemon-reexec`.
Both of you are really close to agreeing ...