Hacker News new | ask | show | jobs
by cyberax 291 days ago
I want to write a systemd haters handbook.

Like:

1. You start and stop services with 'systemctl start/stop nginx'. But logs for that service can be read through an easy-to-remember 'journalctl -xeu nginx.service'. Why not 'systemctl logs nginx'? Nobody knows.

2. If you look at the built-in help for systemctl, the top-level options list things like `--firmware-setup` and `--image-policy`.

3. systemd unifies devices, mounts, and services into unit files with consistent syntax. Except where it doesn't. For example, there's a way to specify a retry policy for a regular service, but not for mount units. Why? Nobody knows.

(To be clear, I _like_ systemd. But it definitely follows the true Unix philosophy of being wildly internally inconsistent.)

8 comments

I like systemd too. After working with it for a long time, a lot of the "wtf" moments eventually are made clear as having at least some semblance of a good reason behind the decision.

1. systemctl is the controller. Its job is to change and report on the state of units. journalctl is the query engine. Merging the query engine into the systemctl controller would make the controller bloated and complex, so a dedicated tool is the cleaner approach. I think you can also rip out the journal and use other tools if you so decide, making building logs into systemctl a bad idea.

2. systemd is a system manager, not just a service manager. It replaced not only the old init system but also a collection of other tools that managed the machine's core state

3. A service runs a process, which can fail for many transient reasons. Trying again is a sensible and effective recovery strategy. A mount defines a state in the kernel. If it fails, it's almost always for a "hard" reason that an immediate retry won't fix. Retrying a failed mount would just waste time and spam logs during boot.

These are fine points, and there are rough edges, but:

1. `systemctl status nginx.service` suffices in many cases. journalctl is for when you need to dig deeper, and it demands many more options. You would have complained about "too noisy CLI arguments" if these were unified.

2. I am not sure about how I should parse this. You mean there are too many arguments in total (2a) or the man page or the help message is not ordered correctly (2b)?

(2a). If you just care about services, you already know [well] a handful of subcommands (start, stop, enable, etc.) and just use those, the other args don't get in your way. For example your everyday commands have safe, sane default options that you will not have to override 99% of the time.

Furthermore, this is much better than the alternative of having a dozen different utilities that have a non-trivial inter-utility interaction that has to be solved externally. Sometimes an application that does (just) one thing won't do well.

(2b). This is subjective (?). I have experienced a few week-long total internet outages (in Iran). I had to study the man pages and my offline resources in those contingencies, and have generally been (extremely) satisfied with the breadth, depth, and the organization of the systems docs. In the age of LLMs this is much less of a problem anyways. I think reading the man page of a well-known utility is not an everyday task, and for a one-off case you will grep the man page anyways.

3. Your point is ~valid. But automount exists for ephermal resources. By default, we won't touch a failing drive without some precautions at least. So fail-fast and no retry is not always wrong. Perhaps it is virtue signaling ... On my PC I don't want to retry anything if a mount fails. In fact I might even want it to fail to boot so that it doesn't go undetected.

Also, for something as critical as mounting, I would probably want other "smart" behavior as well (exponential backoff for network, email, alert, DB fail-over, etc.) and these require specific application knowledge.

So ... they are trying to prevent a foot gun.

> 1. `systemctl status nginx.service` suffices in many cases. journalctl is for when you need to dig deeper, and it demands many more options. You would have complained about "too noisy CLI arguments" if these were unified.

I'm not at all a systemd hater (I think it was needed and it's nowadays a very solid piece of software) but the logs thing should be totally tweakable when viewing it from `systemctl status` and it is n.... [goes to check the man page]

  -n, --lines=
           When used with status, controls the number of journal lines to show, counting from the most recent ones. Takes a positive integer argument, or 0 to disable journal output. Defaults to 10.

Oooh, so TIL.
OMG, TIL, too. This made my morning.
As always RTFM applies - or at least glance through it at high speed.
I parsed (2) in the obvious way of: A manual should start with the common stuff 99% of people need and not with something obscure that you will only need once you are at the level that you know the tool you're using inside out.

That is like opening the manual for your dishwasher and reading a section about how you may check the control-boards conformal coating after the warranty has expired. Useful when you need it and have the repair skills, but a bad way to start a manual.

That’s a tutorial or a getting started guide. The manual is a memory helper, like a tiny encyclopedia, not a teaching material.
How does that change my point about an order by frequency of use being superior? If it is a memory helper, then the stuff people tend to use more often is certainly the stuff that needs to be looked up more.
That’s a variable order. I prefer a more consistent order like a default section structure (which a lot of man pages adopt) and an alphabetical order for flags (which a lot of man pages also adopt).

When I open a manual it’s usually for: flags and argument ordering; argument format (for things like string format or globbing). Some manuals are short enough that it can serve as a guide, but most assumes domain knowledge.

What you want is a cheatsheet. And there’s a lot on the internet and even some tools that collect them. But most practitioners write shell aliases and functions

I believe when people run one of my programs and read the manual, it is my job to not waste their time and respect why they chose to do that.

And that means the first screen they read should cover all the basics if possible.

tutorial vs reference https://diataxis.fr/
1. systemctl already supports logging output, and it's already overloaded.

2. The options are not filtered, so useful options like ('--lines') are lost. E.g. what other options apply to "systemctl status"? The systemd documentation, in general, is a mess. It's a good _reference_ documentation (like 'man') but not a good guide.

3. Network filesystems exist. And they can become unavailable for a time.

> 3. Network filesystems exist. And they can become unavailable for a time.

See [1]

> the same applies to remote file system mounts. If you want them to be mounted only upon access, you will need to use the x-systemd.automount parameters. In addition, you can use the x-systemd.mount-timeout= option to specify how long systemd should wait for the mount command to finish. Also, the _netdev option ensures systemd understands that the mount is network dependent and order it after the network is online.

> You may also specify an idle timeout for a mount with the x-systemd.idle-timeout flag.

[1] https://wiki.archlinux.org/title/Fstab#Automount_with_system...

> Note that this option can only be used in /etc/fstab, and will be ignored when part of the Options= setting in a unit file.

What did I say about consistency (or more precisely, the lack thereof)?

Systemd got better with time and I got better with it over time, which makes it acceptable for me now. I still miss SMF from Solaris years later though. I'm sure there are better systems out there but when the ubiquity is not there it's really hard to adopt them especially in corporate environments. And then you have to learn 2 things if you want to use something else at home, which is already too much for me...
I also liked SMF as well, but I do admit I “cheated” by using a website to make the XML service manifests.
+1 I think such writing would find its audience.

What I would like to see is something that is to systemd what PipeWire is to PulseAudio.

Before PulseAudio getting audio to work properly was a struggle. PA introduced useful abstractions, but when it was rolled out it was a buggy mess. Eventually it got good over time. Then PipeWire comes in, and it does more with less. The transition was so smooth, I did not even realize it I had been running it for a while, just one day I noticed it in the update logs.

systemd now works well enough, but it would be nice to get rid of that accumulated cruft.

systemd and pulseaudio are by the same guy (avahi too). He just writes shit software that sort of works.
Also he has no regards for breaking userspace to the point of needing to get scolded by Linus. But some ideas are good and there is a lot of pioneering work that moves the needle. The trajectories of PulseAudio and systemd are similar, it just needs cleaning up. PulseAudio got fixed up by PipeWire, whereas systemd is at the point of lifecycle yet to reach that stage.
Afaik one of the main problem with the software of his is that it tends to sacrifice ergonomics in the 99% common cases for some obscure theoretical observation.

This is of course about tradeoffs and about the complexities of the problems you're solving, but his software is full of choices that only make sense if you priorize elegant code over elegant software only to then grow into something that is neither.

Lennart worked at Red Hat when he was developing systemd. Red Hat's largest customers often have wacky, weird requirements that you would have never thought of unless you were in that specific customer's situation.
Good point.
There's a podcast [1], which features him as a guest to talk about Linux in general. The main impressions I got from it: he is very confused about what UNIX is and he apparently despises UNIX.

I think he's well suited for his new employer (Microsoft).

[1] (in German) https://cre.fm/cre209-das-linux-system

The reason that PulseAudio did not work at first (and PipeWire worked out of the gate), is that PulseAudio and PipeWire use a lot of relatively newer kernel audio APIs that previous sound daemons did not use. Therefore driver implementations of those APIs were untested and hence buggy when PulseAudio first started using them.

When people say "PulseAudio is not a broken mess anymore", what they really mean is "my audio driver is not a broken mess anymore".

The inconsistency comes from the author thinking "All this init stuff is ancient, and filled with absurd work arounds, hacks, and inconsistencies. I'll fix that!". Then as time passes discovering that "Oh wait, I should add a hack for this special case, and this one, and this one, guess these were really needed!" as bug reports come in over the years.

To be fair, this could happen to any of us, especially early in career. But the real hubris is presuming that things are, as they are, without cause or reason. Along with never really knowing how things actually worked. Or why.

I envision a layperson (which is sort of understanding the author had of modern init systems, when starting on systemd). Said person walks up to a complex series of gears, and thinks a peg is just there for no reason, looks unused, and pulls it out. Only to have the whole mess go bananas. You can follow this logic with all of the half baked, partially implemented services like timekeeping, DNS, and others that barely work correctly, and go sideways if looked at funny.

I think if the author took their current knowledge, and this time wrote it from scratch, it could be far better.

However there still seems to be a chip on their shoulder, with an idea that "I'll fix Linux!" still, when in reality these fixes are just creating immense complication with very minimal upside. So any re-write would likely still be an over-complicated contraption.

When a complex system cannot be meaningfully reduced, another approach might be trying to reduce scope.

Current areas include managing services on a server, managing a single-user laptop, and enterprise features for fleet of devices/users.

There is some overlap at the core where sharing code is useful, but it feels way more complexity than needed gets shipped to my laptop. I wonder how much could be shaved off when focusing only on a single scenario.

I like openrc for laptop or workstation. Writing service is as easy as writing a systemd files (with less options of course, but I never really wanted those).
Yet another approach is exposing internal state.

That way you turn a very complex system into a set of much simpler artificial systems that you can control the interaction.

On your example, that would mean having different kinds of configuration options that go for each of those scenarios, but still all on the same software.

One can argue that systemd tries this (for example, there are many kinds of services). But in many cases, it does the complete opposite of this and reducing scope.

Still, I don't think init systems are a wicked problem (and so, it doesn't need advanced solutions to managing complexity). The wickedness is caused by the systemd's decision to do everything.

> The inconsistency comes from the author thinking "All this init stuff is ancient, and filled with absurd work arounds, hacks, and inconsistencies. I'll fix that!". Then as time passes discovering that "Oh wait, I should add a hack for this special case, and this one, and this one, guess these were really needed!" as bug reports come in over the years.

Don't forget the best one: "We don't support that uncommon use case, we will not accept nor maintain patches to support it, and you shoulden't do it that way anyway, and we are going to make it impossible in the future" -- to something that's worked well for decades.

“Those who do not understand Unix are condemned to reinvent it, poorly.” — Henry Spencer, 1987
I disagree. As much as I dislike a lot of stuff in systemd, it was the _first_ init system that actually cares about reliability.

It evolved organically so it's a bit of a mess as a result, but it's the fate of most long-term projects (including Linux).

Reliability!!

It's the least reliable init system I've ever used!

Yes, reliability. Systemd was the first mainstream init system to deal with service confinement via cgroups, service readiness protocols, and true event-based service activation.

To give you some perspective, at that time, upstart was using ptrace() to detect the double-forking to allow services to be tracked.

Tracking services doesn't provide for a reliable init system. From my perspective, the only job for init is to control startup and shutdown of services.

Not to keep them running. Not to restart them. Not to track them.

I have logs, and monitoring software for that. I have loads of applications to do that, if I wish. But regardless of what you believe an init system is for, the reliability of it is separate from "keeping apps that are so crappy they crash, running".

> I want to write a systemd haters handbook.

Why ? Systemd really fits the Unix haters handbook. It is anti unix as much as it can be ( one command to rule them all, binary logs, etc).

In the end it realy seems that the mantra: GNU is not UNIX is true. Just look at the GNU/Linux: pulseaudio, systemd, polkit, wayland, the big, fat linux kernel

For a brief period of time, binary configs[0] were a thing. In mobile world only, but still. It wasn't that people generally wanted them, but because random seek I/O latency on early mobile devices (and especially on their eMMC storage devices) was atrocious.

Opening up tens or hundreds of XML config files for resync was disgustingly slow. I've developed software on Maemo and Scratchbox; the I/O wait for on-device config changes was a real problem. So of course someone came up with a modified concept of Windows registry - a single, binary format config storage, with a suitably "easy" API. As a result you'd sacrifice write/update latency for the cases where you wanted to modify configurations and gain a much improved read/refresh latency when reading them up.

Of course that all broke down when reading a single config block required to read the entire freaking binary dump and the config storage itself was bigger than the block device cache. Turns out that if you give app developers a supposedly easy and low-friction mechanism to store app configs, their respective PMs would go wild and demand that everything is configurable. Multiply by tens, even low hundreds of apps, each registering an idle-loop callback to re-read their configs to guarantee they would always have the correct settings ready. A system intended to improve config load/read times ended up generating an increased demand for already constrained read I/O.

0: https://wiki.gnome.org/Projects/dconf

GNU promotes Shepherd instead of SystemD.
All those points could be fixed with a wrapper "systemd2" but I definitely see your points.

I like thinking of the minimum set of changes required to fix a problem and this could help, you probably could LLM most of it in less than 30min.

I would like to subscribe to your newsletter... no but really if you ever do get around to writing that I want to read it. Ping me somehow, my Gmail username is the same as my HN username. Happy writing!