Hacker News new | ask | show | jobs
Thoughts on the Systemd Root Exploit (agwa.name)
112 points by uggedal 3429 days ago
8 comments

So he closes with

> Unfortunately, the lock-in they're creating will deprive people of the ability to vote with their feet and switch to better alternatives.

This might sound like a dumb question, and I'm not saying I agree with him, but if I did, I'd vote with my feet by... what? Don't most distros use it now? I'm genuinely curious if this is a practical option for me, though I'm not likely to do it.

FreeBSD, OpenBSD, NetBSD, Illumos, etc. Chances are, you don't even need to tie yourself to the Linux kernel. Lots of OSes out there that don't inflict their users with systemd.
If you don't like systemd due to it being a lock-in, why on earth would you go with the BSD's who are all lock-in's with their respective implementations of lower-level plumbing which systemd is trying to standardize across Linux ?
The BSDs have far less lock-in with the default components than Linux distributions usually do. For example, I dislike the default syslog daemon, so, I disabled the default, and used my own. This involved adding two lines to /etc/rc.conf --

    syslog_ng_enable="YES"
    syslogd_enable="NO"
and that was it. Unlike systemd's way of doing it, the default syslog facility isn't started at all, doesn't run, period. Pretty much all the other low level services -- crond, dhcpd, ntpd, etc -- are the same way. This is one of the many failings as to how systemd operates. One should have reasonable defaults, but at the same time, if the defaults don't work, it should be easy to change them.
Well, journald is pretty much the one component of systemd that is very difficult to replace, but even so you can easily forward to another syslog daemon of your choice.

As for the rest, it's very much the same as with the BSD's, they all support their own versions of low level components and only them, and these can be changed by the user with compatible tools of their choice.

Forwarding to another process still means that if Journald goes belly up, the logging just died...
I think people don't like systemd because it (in their view) has major flaws, and the lock-in prevents other systems from being tried.
You don't have to use systemd on Gentoo either. I don't.
Gentoo has a firm stance towards admin/user choice, and are willing to take action (eudev) to uphold the ability to choose.

This in stark contrast to certain devs over at Fedora and Gnome that hold that choice is bad. Just observe a certain site ebassi maintains...

Thanks for mentioning illumos, I love running it. I'm having a great time with Joyent running a Ghost instance for my blog
Reports are that Devuan works fine. Gentoo reportedly works well, and then there's my favorite distro (for other reasons besides systemd), Slackware.
As a Gentoo user I can confirm it works well without systemd, using openrc, you can use systemd on Gentoo if you want, I personally avoid it.
As an upside OpenRC is simpler and even more configurable than systemd whilst being almost as fast, having had dependency based boot since before systemd was conceived.
Wait does it mean other init systems did not have dependency based boot? I have always been a Gentoo user so always used OpenRC, if that's the case it explains why I never understood why systemd was so "revolutionary".

I do find it harder to configure and understand than OpenRC, after I edit some configurations I can just restart the service and it's working, with systemd I have to run more than one command and they are not intuitive at all.

There have been quite a few. But until Ubuntu introduced Upstart, and i am sure someone will claim it was not dependency based, few if any reached mainstream usage. Keep in mind that RHEL6 actually use Upstart as pid1, but you would not guess it as Upstart deals with sysv rc "just fine".

Thing is that systemd is lead by some of the most myopic developers in the Linux community. If is not in use by RHEL or Fedora it does not exist. End result is a litany of NIH projects that could have been avoided had they looked around just a little bit.

The init system is easily replaceable. The problem is things like polkit, which now depend on systemd (logind).
I do have polkit installed, there is an use flag "systemd" that I have disabled and so far I had no problems, it keeps systemd away from my system, and everything works.
Another alternative is Funtoo, a (soft) Gentoo fork, with additional integrated means to keep systemd away.
I guess that's his point. You don't have many realistic options.
Yeah, that's why I'm wondering. As someone above mentioned, I could switch to FreeBSD, OpenBSD, NetBSD, Illumos, etc... but jumping from Ubuntu to those don't seem very realistic given the limits on time.
Ubuntu users should switch to Calculate Linux or Slax.
Perhaps systemd should be specified as an API, with a test suite, so that there could be multiple independent implementations that compete for mindshare?
The API/architecture is what people object to so giving them more implementations won't help. And there aren't enough people to maintain existing init systems so creating more unmaintained software isn't a good idea.
>The API/architecture is what people object to so giving them more implementations won't help.

Not really. The majority of the people I've discussed the subject with and myself agree that the problem is not so much the architecture, as it seems to be somewhat decoupled and merely needs a little push. The problem is the APIs are unstable and writing an implementation is pure hell.

Systemd does some pretty great stuff, it's just that the APIs need to settle down a bit and be tiered into feature sets.

Mind you, I have read very little systemd code, there could be way more tight coupling than what I believe there is. It would be good for a dev to chime in.

Void

It is interesting for more than just the use of runit...

I've gone systemd-free on Arch Linux, no problems so far.
Anyone care to explain the downvoting of a simple factual statement? Are the systemd fans angry that other options are viable?

For anyone else wanting to completely purge systemd from an Arch install, here is a good and thorough description; takes less than one hour to switch if you read it thoroughly, about ten minutes if you just do as it says in the code listings.

http://systemd-free.org

configuring freshly installed packages must be a pain
How so?
because presumably they'd be missing init scripts...
> Don't most distros use it now?

I'd submit that, in order to use systemd, you must have to give up the term "distro" to describe yourself.

With that in mind, there are only 5 distributions left:

1) Devuan

2) Slackware (including Slax)

3) Void Linux

4) Gentoo (including Calculate Linux)

5) systemd

If you wanted to vote with your feet, and wanted to keep .deb compatability, you'd move to Devuan. Otherwise, there are four other choices, two of which have LiveCD/LiveUSB install options.

There's nothing stopping you from writing a drop-in systemd replacement in Rust or something that's easier to write safe code for.
I would love for such a thing (and have thought about doing it myself). But as systemd's capabilities grow more and more expansive, and with its lack of modularity, it becomes harder and harder to write a replacement for it. Of course, even replacing only a piece of systemd's functionality (for example its process manager) would still be beneficial, but systemd would continue to be the only real choice for many.
The way you overthrow an overlord you no longer like is by knuckling down, working hard, kicking out a replacement. This is how LibreSSL came around (http://libressl.org).

There's nothing especially wrong with the idea of systemd or the way it's been deployed, but if the code-base is suffering from neglect one way to fix that is to either support the core team, or barring that due to hostility, fork and/or make a work-alike.

we don't want a work-alike. systemd is not necessary, as all the shims for other platforms will attest.
You don't. Other people want systemd to work, to be improved, and to move things forward.
I would suggest that simple common sense might prevent such a thing. It's not without reason that the older, wider heads always kept initd simple, in part to prevent it becoming an unnavigable monstrosity. The very idea of the "legacy" init implementations creating files, let alone suid root files, is laughable.
"Simple" and "systemd" are two things that really don't belong in the same room.

While sysvinit had a lot of short-comings, at least it was simple.

> systemd was using a magic value (-1) to represent an invalid mode_t value

This is not a correct description of what happened, note. Kay Sievers originally used 0 to represent an invalid value. Lennart Poettering changed this to -1, because 0 is clearly not an invalid value. The bug resulted because he missed changing one of the comparison-against-zero validity checks.

See https://news.ycombinator.com/item?id=13472516

> The bug resulted because he missed changing one of the comparison-against-zero validity checks.

Yes, that's exactly the kind of mistake that happens when you use magic values to represent invalid values instead of distinct types.

You still have not got it right. This is an old class of problem, documented since the 1960s. It's nothing to do with types. It's nothing to do with what language one uses (since it has been documented in languages as diverse as RPG and Pascal). It's the use of unexplained constant literals.

M. Poettering was in fact doing the right thing and correcting the problem, replacing the unexplained constant literals written by M. Sievers ("mode > 0") with named constants ("mode != MODE_INVALID"). It is an example of the problem, one of whose symptoms is the question "Well which of these is the specific constant and which just happens to also be that number?", that M. Poettering missed a "mode > 0" that also needed replacing.

Make no mistake. M. Poettering was actually applying the long-time well-understood fix for this.

Here's Ted Holt talking about unexplained constant literals in RPG 4:

* https://www.itjungle.com/2004/08/18/fhg081804-story01/

This same problem, and the approach of turning unexplained literals into named constants to improve maintainability, is explained all over the place, from William Allan Wulf's Fundamental structures of computer science published in 1983, through Niklaus Wirth in the 1975 Proceedings of the IEEE Conference on Reliable Software and Clark and Horning in a SIGPLAN paper in September 1973, to several of Gary Cornell's books on QuickBASIC and Visual BASIC in the 1990s.

This is not a new thing, not language-specific, (clearly!) not addressed by changing language, nor addressed by types.

It actually has a lot to do with types; while it can occur in a few other situations, the two biggest places it occurs:

(1) Where ints (or some other enumerable value unrelated to the problem domain) are used in place of self-describing enumerations because of lack of type system support for enums.

(2) When values within the domain of a type but outside of the domain that would otherwise be generated are used to signal special situations because of lack of type system support for sum types.

(The main other place that they occur is with "breakpoints" within a domain of a normal type, but even these are arguably a workaround for the absence of the combination of sum types and range-constrained types.)

sigh I lost track of the kernel mitigation for the exploit. I'll go take care of it (i.e., I just wrote the patch and I'll get it reviewed).
Sincere question without picking a side:

Is Systemd being coded with the same level of 'care' as OpenSSL was before being pwned?

These projects aren't very glamorous to work for, plus when everyone's shitting on your work every day it's hard to stay motivated.

Anyone who works on systemd is doing difficult work. We should treat these teams better and give them support rather than just brow-beat them for their mistakes.

Not at all. OpenSSL was suffering from a lack of effort/funding and a desire not to 'break' anything. Systemd has plenty of effort and breaks everything, but a lack of philosophy, a lack of introspection. It's ignoring plenty of hard lessons about security practice and being very Microsoft-y. Massive technical debt which we will collectively pay for for a decade or more.
The buggy TLS heartbeat extension was new code I think though.
From my experience - not that much care. I've found a different (remote) DoS issue in systemd-resolved a few months ago. It was a really obvious parsing issue. Also no CVE or announcement.
If only systemd was written with the level of care OpenSSL had...
> systemd was using a magic value (-1) to represent an invalid mode_t value, and C's type system did not prevent passing it to the mode argument of open

The open syscall should reject unrecognized flags in the mode argument (EINVAL), rather than just truncating down to recognized flags. That would also prevent this specific problem with the sentinel value being used on accident.

In the actual open and openat system calls, the mode parameter is a 16-bit unsigned integer. There are 12 permissions bits and 4 file type bits. There are no "unrecognized flags". All 16 bits have meaning.

* http://lxr.free-electrons.com/source/include/uapi/linux/stat...

* http://lxr.free-electrons.com/source/include/linux/types.h#L...

Bits above 1<<11 (the non-permission bits) are not valid arguments to open(2), so I don't see what your point is. In this context, they are invalid. open(2) should reject values with bits outside of 07777 (== 0x0fff) set, including (mode_t)-1 (== 0xffff).

Here is the specific place where Linux truncates the bogus mode, instead of rejecting it: http://lxr.free-electrons.com/source/fs/open.c#L906

(S_IALLUGO defined here: http://lxr.free-electrons.com/source/include/linux/stat.h#L9 )

This change would fix this class of issue:

    --- a/fs/open.c
    +++ b/fs/open.c
    @@ -889,9 +889,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
            int lookup_flags = 0;
            int acc_mode = ACC_MODE(flags);
    
    -       if (flags & (O_CREAT | __O_TMPFILE))
    +       if (flags & (O_CREAT | __O_TMPFILE)) {
    +               if ((mode & ~S_IALLUGO) != 0)
    +                       return -EINVAL;
                    op->mode = (mode & S_IALLUGO) | S_IFREG;
    -       else
    +       } else
                    op->mode = 0;
    
            /* Must never be set by userspace */
> They have even replaced DNS with a dbus-based protocol, which they "strongly recommend" applications use instead of DNS.

This seems inaccurate. The phrase "strongly recommend" appears once in the manpage, where it is strongly recommended that you use either the standard libc resolver API, with libnss_resolve, or the D-Bus API.

Applications should be using the libc resolver API instead of implementing DNS themselves. There are some applications like Chrome that implement DNS themselves because they care very much about DNS; those applications presumably know how to do all the things systemd-resolved does. Everyone else should get name resolution functionality from libc. That's what you've been supposed to do for decades, and it's a standard UNIX interface. That standard interface supports things like LLMNR that you don't get if you implement DNS yourself.

Unfortunately, the standard UNIX interface is synchronous, which is why libraries like ares or adns exist. If you want to use such a library, you can point it at 127.0.0.53, but you still have the limitations of what can be expressed in DNS. (And you're still using a nonstandard API to speak to libares or libadns.) No API exists that is standard, async, and does everything that libc getaddrinfo() is capable of doing. So systemd built one.

That's pretty standard behavior for systemd: implement compatibility interfaces where they exist, recommend them if they're good (systemd explicitly recommends /etc/fstab over writing native mount units, because /etc/fstab is a perfectly good format), implement them anyway if they're not, and write a better API, based on D-Bus, when needed. The latter bit not going through a multi-implementer standards committee isn't great, but it's nowhere near as bad as presented.

Anyway, this is completely irrelevant to the rest of the analysis, which seems absolutely correct, and I'm not sure why the author included this parting shot.

Reading the man page it is actually recommending systemd-resolved over other options.

It says:

- option 1 (recommended): use systemd-resolved API.

- option 2: use glibc API with a glibc NSS module to resolve host names via systemd-resolved.

- option 3 (not recommended): local DNS stub listener on loopback to connect direct request to systemd-resolved.

Author included this part to illustrate how the real issue is that systemd is an unprecedented lock-in. Honestly an init process implementing a DNS resolver? Where is my kitchen sink ?

> Honestly an init process implementing a DNS resolver?

Systemd is a project that manages a large number of low-level services and programs that work together to try to help create a cohesive operating system.

Systemd is also the name of a init program.

These have the same name, but are not the same thing.

Systemd init process does not provide any DNS resolver features. Systemd-resolved, however, does.

Systemd is an init system, don't take my word for it[1], I seriously doubt there is any need to add a DNS resolver to an init system, especially one that reintroduced vulnerabilities.

This "project" you are talking about is this very init system + feature creep + mission creep + software bloat + interlocked dependencies to force adoption + time.

[1]: http://0pointer.de/blog/projects/systemd.html

Your reference link is nearly 7 years old. From the current project homepage [0]:

"systemd is a suite of basic building blocks for a Linux system. It provides a system and service manager that runs as PID 1 and starts the rest of the system. systemd provides aggressive parallelization capabilities, uses socket and D-Bus activation for starting services, offers on-demand starting of daemons, keeps track of processes using Linux control groups, maintains mount and automount points, and implements an elaborate transactional dependency-based service control logic. systemd supports SysV and LSB init scripts and works as a replacement for sysvinit. Other parts include a logging daemon, utilities to control basic system configuration like the hostname, date, locale, maintain a list of logged-in users and running containers and virtual machines, system accounts, runtime directories and settings, and daemons to manage simple network configuration, network time synchronization, log forwarding, and name resolution."

[0] https://freedesktop.org/wiki/Software/systemd/

To be fair to systemd, systemd-resolved is not an init process. It is its own service that just happens to integrate with systemd and is part of the wider systemd project (with journald, timesyncd, etc.)
to be fair to systemd, it is an init system with a severe case of feature creep to the point that it now includes a DNS resolver that came with vulnerabilities long fixed in the existing ones.
systemd-resolved is a separate package. It is not the init system and not a requirement of the init system.
Official systemd homepage[1] begs to differ, it says systemd is an init system including many features among which is name resolution:

>systemd (...) provides a system and service manager that runs as PID 1 and starts the rest of the system. (...) Other parts include a logging daemon, (...), log forwarding, and name resolution.

[1]: https://freedesktop.org/wiki/Software/systemd/

-edit- Not sure where it is a separate package, just checked debian and arch, the systemd package contains systemd-resolved. https://packages.debian.org/jessie/amd64/systemd/filelist https://www.archlinux.org/packages/core/x86_64/systemd/

They added the strongly from another paragraph, that's true. But the man page does recommend the dbus API over libc:

> The native, fully-featured API systemd-resolved exposes on the bus. See the API Documentation[1] for details. Usage of this API is generally recommended to clients as it is asynchronous and fully featured

> No API exists that is standard, async, and does everything that libc getaddrinfo() is capable of doing. So systemd built one.

They built an API that was async and does everything that getaddrinfo is capable of doing. They did not build an API that was standard. They did not build an API that even had the potential to become standard, because many systems do not use D-Bus, and they are not going to add it just for a slightly better DNS resolution API than what already exists.

What could they have done instead? Either or both of:

(1) Implement an extension to the DNS protocol that handles whatever extra bits they need. This is probably the best approach due to the multitude of applications that bypass libc already. Actually, I'm not convinced after reading the manpage that an extension is even necessary... what's the issue with link-local addresses? Can't they just have the DNS server on localhost synthesize records when needed? In fact, based on the rest of the manpage, aren't they already doing that? And what's the issue with Unicode? Can't they translate between DNS punycode and whatever encoding LLMNR uses?

But if an extension to DNS really is needed, it has the potential to be proposed as a standard and eventually become ubiquitous, whereas an ad-hoc replacement interface does not.

(2) (Worse idea, probably:) Propose a libc API that would be an async version of getaddrinfo with whatever enhancements are desired. Implement a portable polyfill library that either calls getaddrinfo on a thread or (if the API has extended functionality in addition to being async) uses their D-Bus stuff, depending on platform.

Admittedly, both options seem more fiddly and more work than 'just' adding some D-Bus calls. But when the existing story for name resolution is largely fully cross-platform, it seems like a bad idea to abandon that just for the sake of small improvements.

(2) already exists in at least glibc: getaddrinfo_a. Using it ~5 years ago though I found plenty of bugs in it (e.g. https://sourceware.org/ml/libc-help/2012-07/msg00024.html )

However, getaddrinfo is not a great inferface: you still can't use it to e.g. look up an MX record. For that you need res_query(3), which does not have an async interface in libc. Pottering himself wrote a library to use res_query in a separate thread http://0pointer.de/lennart/projects/libasyncns/.

However, I don't like threads, and will avoid them where possible in libraries (an example reason: I like to be in a defined state after fork()). Which means I need an async dns library that implements a resumable state machine. Lately I've been using http://25thandclement.com/~william/projects/dns.c.html

Why is (2) a worse idea ? As a programmer I would vastly prefer it over dealing with DBus myself or dealing with DNS directly
Non-DNS resolution protocols like LLMNR are almost entirely irrelevant, particularly on servers. Even if you do need asynchronous LLMNR support, you do not need dbus and a particular process running as PID 1 to get it - applications can make LLMNR queries using an asynchronous library, just like many currently use an asynchronous DNS library.

The fact that systemd keeps making decisions like this that are architecturally dubious and lead to lock-in is most certainly grounds for criticism.

When people enter a hostname into my application, I'm not particularly keen on implementing name resolving myself via DNS, LDAP, hosts file and so on in the application - It's absurd to suggest applications should make such a decision.

The current NSS system works nicely though, it just needs an async API.

Nothing systemd has done is preventing anybody from using any asynchronous library they feel like.

It's providing local name resolution services. And for very good reasons.

By your logic things like NSS is useless as well because programs themselves can read ldap configuration files and /etc/resolv.conf on their own using libraries or whatever else they feel like using.

I'm not sure why the author included this parting shot

Everybody loves kicking systemd as they re-invent various wheels; see... even I can't resist!

Oh great another whine-about-systemd rant.
Everytime I see someone mention rust or c++ I stop reading. Might as well mention COBOL or FORTRAN in the list of 'good' languages, language used is irrelevant.

Anyway it is a local exploit on an old release, not a good thing but containable.

Apparently relevant reading from the article:

> A language with a better type system, such as Rust or C++ (which has std::optional) can help prevent this kind of error.

> That said, this is not about programming languages.

> Rewriting systemd in a safer language would not transform it into quality software, ...