Hacker News new | ask | show | jobs
Response to systemd's use of the debug flag (plus.google.com)
38 points by asimov42 4452 days ago
10 comments

Hmm, I was expecting "Sorry we made a mistake, we'll switch the option to systemd.debug".

> At that point there is simply no other option for that, because persistent storage is not available

This was about overloading an already used option by another team building a core system component -- the kernel. A debug for kernel's command line is for the kernel.

> It's the option an admin can specify which tells him why the system doesnt boot,

Ok so he does and now his system also doesn't boot but now it is either because of the original problem or because it gets flooded by systemd logs.

And then, he goes and posts to the kernel mailing lists saying how kernel is a piece of shit.

> That turns this into some kind of power game, which I am totally not interested in.

also

> We are putting together an OS here after all, not just a kernel, and a kernel is just one component of the OS among many, and ultimately an implementation detail.

I think due to their attitude towards both testing, towards the kernel community, they shouldn't be building core system components. And did he just write that kernel is just "an implementation detail?".

Maybe systemd was a mistake. Integrating and dumping socket acceptors, logging, and the whole kitchen sink into one component. So when it breaks it really breaks.

>And did he just write that kernel is just "an implementation detail?".

In theory, systemd could be made to work on BSDs, Hurd, Minix, or whatever.

In reality, I'd sooner expect systemd to provide it's own kernel, but you never know.

The context for that remark is:

My system isn't booting. I don't know why. I'll pass "debug" to find out. Right here, and now, the fact is that the problem could be in many places inside or outside the kernel including (but not limited to) systemd. Requiring people to understand the current low level structure of their OS just to get some extra debug reporting on a failed boot is exposing implementation details.

Google+ really doesn't get retweeting, does it. That whole post is written by someone who isn't the one in the photos at the top. A single "originally shared" link points to the original.

Retweeting was a genius move by twitter, a post by someone you don't follow appears in your timeline and it looks like any other and attribution is perfectly captured.

Totally agree. As things are currently, wouldn't it be best if the story link was to the actual original post?

https://plus.google.com/+LennartPoetteringTheOneAndOnly/post...

My guess is that the comments on this one led to it being posted here (there is discussion from Linus and others down there.)
Yup. And Kay Sievers has another clone of the story, with a (quite unapologetic) comment of his own: https://plus.google.com/+KaySievers/posts/3cWXzYqB6MB
I have a fair amount of experience with Linux, but I'm a little lost by what's going on here. Can someone out there dumb this down a bit for an init newbie?
You can pass the kernel options before booting. i.e. "quiet" or "debug". The Kernel exposes these options via /proc/cmdline.

systemd parses /proc/cmdline and if "debug" is set, it also raises it's log level to debug. Then systemd logs so much to the kernel log that it gets flooded and fails to boot.

Edit: And story wise, systemd says yeah, we're not going to fix this or rename the debug flag for systemd to "systemd.debug" because blah.

And Torvalds gets pissed.

So basically, systemd is catching the common keyword 'debug' from a command that being given to something else entirely, and in turn enabling its own logging, which floods the system. The command wasn't actually meant for systemd, but for the kernel. Is that correct?
You're correct, but from a user perspective it's a totally valid thing to do. When your system (or pretty much any software) doesn't work, you want to pass some "debug" flag so that the maximum information is logged and you can parse that later. You don't want to pass one debug flag per component, you want it to be global (just like when you say "quiet", you want every component to be quiet).
Pretty much. Though Linus says he's okay with other people parsing these options, that's why they expose them. But of course you should do it in a sane way. And systemd refuses to patch their code to namespace their flag and listen for "systemd.debug" instead of the generic debug flag that is primarily meant for the kernel.
The kernel has a debug option that can be specified at boot time to increase its logging verbosity. systemd looks at the kernel command line to do the same and starts writing its own verbose logging out to the kernel's message buffer. Writing out to the kernel message buffer is done for pragmatic reasons: generally if you're debugging init problems they're happening when the machine is first starting so you would like to see the error messages on the serial console/virtual console.

The bug that started this all came up when someone had a broken system that was causing systemd to spew out tons of log messages. This was overwhelming the kernel message system and causing things to time out/etc and resulted in an unbootable system. When this problem got opened upstream, the systemd maintainer said "not my problem."

There are a couple of ways forward here: systemd can limit its own output and say "my bad"[1], systemd can begin looking for its own debug flag in the kernel command line instead of using the kernel's own debug flag or the kernel can implement rate limiting on the message system[2] to keep userspace from shooting itself in the foot.

Torvalds really doesn't like the systemd maintainer's "not my problem" attitude so he's threatened to no longer accept pull requests from him for any code he writes. Torvalds wrote a really good post in the linked Google+ comments that describes his feelings on the whole thing:

"I don't mind people piggy-backing on some fairly obvious generic term like "debug" per se. I don't know if the old init scripts did that, but I do know they did it for "quiet", which is basically the reverse of "debug".

What I mind is people closing bugs and not admitting mistakes. If Kay had even said "sorry, the excessive output was a bug in systemd, it's already fixed in current -git", that would have been a valid reason to close the bug.

And for the people thinking this is a storm in a teacup: this is not the first time Kay has done this, which is why I personally get so frustrated.

Kay has done the exact same thing with major bugs that were not fixed anywhere else, and that caused machines to fail at boot time, and Kay happily pointed the finger elsewhere for months at a time and closed bugzilla entries. 

Side explanatory note: and it's because of that known history of abusive behavior that I would prefer systemd now use "systemd.debug".

The old init scripts may or may not have parsed "debug", but we never had any reason to care. Now we do, and people are (I think) understandably upset that systemd not only screwed up, but then the people involved weren't even willing to say "sorry" about it but instead go "uh, it wasn't our bug, deal with it"."

----

[1] This happened

[2] This will likely happen

> systemd can limit its own output and say "my bad"[This happened]

Did it? I didn't find any such commit. IMHO it's the correct fix. /dev/kmsg is privileged and systemd should rate-limit what it writes there after giving priority to more critical messages.

I guess I phrased it wrong, I was referring to the bug fix that stopped systemd from all of the spam and not internal rate limiting.
Okay, I think 6a0f1f6d5af7c7300d3db7a0ba2b068f8abd222b "sd-event: rework API to support CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM, too" removed the dual_timestamp_is_set assert. It looks like it was fixed by accident, and there's zero acknowledgement that flooding /dev/kmsg could be a problem on systemd's side.
Very explanatory, thank you very much.
In a nutshell, a Linux kernel developer wants to limit the number of debugging messages he is obliged to read and act on, and Linus Torvalds thinks that's counterproductive, and Linus generally says what he thinks.

In a nutshell. :)

I wonder how this whole thing would have gone down if the original bug report had simply described the problem rather than demanded their preferred solution to what they thought the problem was.

Linus and others keep acting like the bug was "systemd floods the kernel logs when I pass the debug flag" but it was actually it was "Do not parse "debug" command line ".

The obvious answer to the first is " Sorry about that, the bug is fixed in master branch".

The obvious answer to the second is "No, that's what it's for"

I think Linus has been pretty clear on both topics (fix A, and if you won't, fix B) and on general policy (don't break it for users or I don't want to work with you in the kernel). Yeah, the reporter could have framed it better from the start, but the developers are supposed to be adults and handle imperfect bug reports too.

FWIW, I don't really consider the first issue fixed. The trigger was lost in a refactoring that can't be backported, and there are probably other ways to trigger the issue.

to be clear (I am not english). Ratelimit means drop some logs. Lennart Poettering is commplaining because he wants the kernel to be happy with him flooding the kernel with logs.

When I debug software, I do not want to lose logs and I do not want software to produce so much logs it is impossible to use.

> A kernel is just one component of the OS among many, and ultimately an implementation detail.

-- Lennard Poettering

God damn it, now in addition to fighting off pulseaudio and systemd from being on my Gentoo box, in a few years I'm going to have to fight off the Lennart kernel...
Uhm without Linux specific kernel details systemd would not exist.

This as well as journalctl taking minutes to show logs after months of normal usage...systemd is out from my Arch.

Thanks, I am looking into openrc, as well as gnu dmd.
I am surprised by the hostility displayed by the systemd guys. I was expecting them to admit they were wrong, and things would be back on track.

With such a reply, I wonder what effect will this have on kdbus making forays into the kernel (not that I am looking forward to it ;).

"Here's a nice note for all the reporters who are emailing me, go suck it."

"Yes it's titillating, and drives page views, but really, is that the best use of an Liberal Arts degree?"

Ah, the perfect guy to represent our profession.

thankfully we have systemd to manage these kinds of module dependency startup problems in linux now.
>Correct. I don't mind people piggy-backing on some fairly obvious generic term like "debug" per se. I don't know if the old init scripts did that, but I do know they did it for "quiet", which is basically the reverse of "debug".

>What I mind is people closing bugs and not admitting mistakes. If Kay had even said "sorry, the excessive output was a bug in systemd, it's already fixed in current -git", that would have been a valid reason to close the bug.

So Torvalds is saying he doesn't like dictatorial project maintainers who reply in an abrupt and abrasive manner to contributors? He is so concerned about the lack of politeness and professional discourse that he just had to raise this issue? Hilarious.

>> What I mind is people closing bugs and not admitting mistakes.

> So Torvalds is saying he doesn't like dictatorial project maintainers who reply in an abrupt and abrasive manner to contributors?

You have quoted what he did say and made up a position that he did not.

No. He doesn't like project maintainers who reply in an abrupt and dismissive way to users with bugs. In this, Linus is very consistent and not hypocritical at all.