Hmm, I was expecting "Sorry we made a mistake, we'll switch the option to systemd.debug".
> At that point there is simply no other option for that, because persistent storage is not available
This was about overloading an already used option by another team building a core system component -- the kernel. A debug for kernel's command line is for the kernel.
> It's the option an admin can specify which tells him why the system doesnt boot,
Ok so he does and now his system also doesn't boot but now it is either because of the original problem or because it gets flooded by systemd logs.
And then, he goes and posts to the kernel mailing lists saying how kernel is a piece of shit.
> That turns this into some kind of power game, which I am totally not interested in.
also
> We are putting together an OS here after all, not just a kernel, and a kernel is just one component of the OS among many, and ultimately an implementation detail.
I think due to their attitude towards both testing, towards the kernel community, they shouldn't be building core system components. And did he just write that kernel is just "an implementation detail?".
Maybe systemd was a mistake. Integrating and dumping socket acceptors, logging, and the whole kitchen sink into one component. So when it breaks it really breaks.
My system isn't booting. I don't know why. I'll pass "debug" to find out. Right here, and now, the fact is that the problem could be in many places inside or outside the kernel including (but not limited to) systemd. Requiring people to understand the current low level structure of their OS just to get some extra debug reporting on a failed boot is exposing implementation details.
Google+ really doesn't get retweeting, does it. That whole post is written by someone who isn't the one in the photos at the top. A single "originally shared" link points to the original.
Retweeting was a genius move by twitter, a post by someone you don't follow appears in your timeline and it looks like any other and attribution is perfectly captured.
I have a fair amount of experience with Linux, but I'm a little lost by what's going on here. Can someone out there dumb this down a bit for an init newbie?
You can pass the kernel options before booting. i.e. "quiet" or "debug". The Kernel exposes these options via /proc/cmdline.
systemd parses /proc/cmdline and if "debug" is set, it also raises it's log level to debug.
Then systemd logs so much to the kernel log that it gets flooded and fails to boot.
Edit: And story wise, systemd says yeah, we're not going to fix this or rename the debug flag for systemd to "systemd.debug" because blah.
So basically, systemd is catching the common keyword 'debug' from a command that being given to something else entirely, and in turn enabling its own logging, which floods the system. The command wasn't actually meant for systemd, but for the kernel. Is that correct?
You're correct, but from a user perspective it's a totally valid thing to do. When your system (or pretty much any software) doesn't work, you want to pass some "debug" flag so that the maximum information is logged and you can parse that later. You don't want to pass one debug flag per component, you want it to be global (just like when you say "quiet", you want every component to be quiet).
Pretty much. Though Linus says he's okay with other people parsing these options, that's why they expose them. But of course you should do it in a sane way.
And systemd refuses to patch their code to namespace their flag and listen for "systemd.debug" instead of the generic debug flag that is primarily meant for the kernel.
The kernel has a debug option that can be specified at boot time to increase its logging verbosity. systemd looks at the kernel command line to do the same and starts writing its own verbose logging out to the kernel's message buffer. Writing out to the kernel message buffer is done for pragmatic reasons: generally if you're debugging init problems they're happening when the machine is first starting so you would like to see the error messages on the serial console/virtual console.
The bug that started this all came up when someone had a broken system that was causing systemd to spew out tons of log messages. This was overwhelming the kernel message system and causing things to time out/etc and resulted in an unbootable system. When this problem got opened upstream, the systemd maintainer said "not my problem."
There are a couple of ways forward here: systemd can limit its own output and say "my bad"[1], systemd can begin looking for its own debug flag in the kernel command line instead of using the kernel's own debug flag or the kernel can implement rate limiting on the message system[2] to keep userspace from shooting itself in the foot.
Torvalds really doesn't like the systemd maintainer's "not my problem" attitude so he's threatened to no longer accept pull requests from him for any code he writes. Torvalds wrote a really good post in the linked Google+ comments that describes his feelings on the whole thing:
"I don't mind people piggy-backing on some fairly obvious generic term like "debug" per se. I don't know if the old init scripts did that, but I do know they did it for "quiet", which is basically the reverse of "debug".
What I mind is people closing bugs and not admitting mistakes. If Kay had even said "sorry, the excessive output was a bug in systemd, it's already fixed in current -git", that would have been a valid reason to close the bug.
And for the people thinking this is a storm in a teacup: this is not the first time Kay has done this, which is why I personally get so frustrated.
Kay has done the exact same thing with major bugs that were not fixed anywhere else, and that caused machines to fail at boot time, and Kay happily pointed the finger elsewhere for months at a time and closed bugzilla entries.
Side explanatory note: and it's because of that known history of abusive behavior that I would prefer systemd now use "systemd.debug".
The old init scripts may or may not have parsed "debug", but we never had any reason to care. Now we do, and people are (I think) understandably upset that systemd not only screwed up, but then the people involved weren't even willing to say "sorry" about it but instead go "uh, it wasn't our bug, deal with it"."
> systemd can limit its own output and say "my bad"[This happened]
Did it? I didn't find any such commit. IMHO it's the correct fix. /dev/kmsg is privileged and systemd should rate-limit what it writes there after giving priority to more critical messages.
Okay, I think 6a0f1f6d5af7c7300d3db7a0ba2b068f8abd222b "sd-event: rework API to support CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM, too" removed the dual_timestamp_is_set assert. It looks like it was fixed by accident, and there's zero acknowledgement that flooding /dev/kmsg could be a problem on systemd's side.
In a nutshell, a Linux kernel developer wants to limit the number of debugging messages he is obliged to read and act on, and Linus Torvalds thinks that's counterproductive, and Linus generally says what he thinks.
I wonder how this whole thing would have gone down if the original bug report had simply described the problem rather than demanded their preferred solution to what they thought the problem was.
Linus and others keep acting like the bug was "systemd floods the kernel logs when I pass the debug flag" but it was actually it was "Do not parse "debug" command line ".
The obvious answer to the first is " Sorry about that, the bug is fixed in master branch".
The obvious answer to the second is "No, that's what it's for"
I think Linus has been pretty clear on both topics (fix A, and if you won't, fix B) and on general policy (don't break it for users or I don't want to work with you in the kernel). Yeah, the reporter could have framed it better from the start, but the developers are supposed to be adults and handle imperfect bug reports too.
FWIW, I don't really consider the first issue fixed. The trigger was lost in a refactoring that can't be backported, and there are probably other ways to trigger the issue.
to be clear (I am not english). Ratelimit means drop some logs. Lennart Poettering is commplaining because he wants the kernel to be happy with him flooding the kernel with logs.
When I debug software, I do not want to lose logs and I do not want software to produce so much logs it is impossible to use.
God damn it, now in addition to fighting off pulseaudio and systemd from being on my Gentoo box, in a few years I'm going to have to fight off the Lennart kernel...
>Correct. I don't mind people piggy-backing on some fairly obvious generic term like "debug" per se. I don't know if the old init scripts did that, but I do know they did it for "quiet", which is basically the reverse of "debug".
>What I mind is people closing bugs and not admitting mistakes. If Kay had even said "sorry, the excessive output was a bug in systemd, it's already fixed in current -git", that would have been a valid reason to close the bug.
So Torvalds is saying he doesn't like dictatorial project maintainers who reply in an abrupt and abrasive manner to contributors? He is so concerned about the lack of politeness and professional discourse that he just had to raise this issue? Hilarious.
No. He doesn't like project maintainers who reply in an abrupt and dismissive way to users with bugs. In this, Linus is very consistent and not hypocritical at all.
> At that point there is simply no other option for that, because persistent storage is not available
This was about overloading an already used option by another team building a core system component -- the kernel. A debug for kernel's command line is for the kernel.
> It's the option an admin can specify which tells him why the system doesnt boot,
Ok so he does and now his system also doesn't boot but now it is either because of the original problem or because it gets flooded by systemd logs.
And then, he goes and posts to the kernel mailing lists saying how kernel is a piece of shit.
> That turns this into some kind of power game, which I am totally not interested in.
also
> We are putting together an OS here after all, not just a kernel, and a kernel is just one component of the OS among many, and ultimately an implementation detail.
I think due to their attitude towards both testing, towards the kernel community, they shouldn't be building core system components. And did he just write that kernel is just "an implementation detail?".
Maybe systemd was a mistake. Integrating and dumping socket acceptors, logging, and the whole kitchen sink into one component. So when it breaks it really breaks.