Hacker News new | ask | show | jobs
by aseipp 3411 days ago
Honestly, this is all completely a red herring if you ask me. The real solution is to neutralize the attack vectors by identifying the means -- not to waste countless cycles playing whack-a-mole by recompiling and rebooting your kernel all the time.

The best back-of-the-napkin calculation for "How many bugs does my software have" can be calculated as: a linear function of your LOC. You have 2x as many LOC as you did yesterday? Constant factors aside, it's reasonable to assume you have 2x as many bugs as you did yesterday.

Now you are dealing with a system (Linux) that has almost no self-protection features to stop attack methods. You are also dealing with a language in which a single error like a double free is not only very easily accomplishable, but can now lead to full system compromise. The whole system is tens-of-millions of LOC. It is not hard to see that this approach of playing "recompile every week" is going to scale very poorly overall, and it cannot be done by everyone (due to logistic and cognitive overhead).

Furthermore, stopping actual attack vectors has massive bonuses compared to the whack-a-mole game. For one, it protects against current threats only discovered in the future. This vulnerability is a decade old. There are likely on the order of _millions_ of machines with affected kernels. Many of these machines likely will never get upgraded again (vendor kernels, EOL, whatever).

You won't be able to set CONFIG_MODULES=n on the 8 year old Linux-based router/firewall some SMB has running (after a shitty Outlook server employee password gets hacked, and someone downloads a still-active VPN certificate in their corporate email and logs in to begin pivoting -- and it isn't long till they can persist on something like that if they see it).

Second, along the same lines -- whack-a-mole does nothing to prevent against targeted attacks. A theoretical targeted attacker is going to be vastly more capable than someone running a pre-canned exploit -- you cannot assume they are somehow both A) competent enough to pull off a deeply targeted attack, yet B) too stupid to fall to basic measures like `CONFIG_MODULES=n`

A recompiled kernel is not going to stop a dedicated attacker from finding a stable exploit in the other N-million lines of code inside Linux. You're playing an unwinnable game in this scenario, where the adversary only needs time. So both of these methods fail: in the small scale, it will fail for the vast majority of already-affected systems. On the large scale for targeted attacks, it will fail to truly competent attackers.

Sitting around and investing in massive kernel building infrastructure can be completely obviated by doing one thing: running grsecurity. Or improving Linux's real self-protection/security features. Only one of those is viable at the moment.

Then, none of these attacks will work. While there is the aspect that exploits may not be tuned to attack grsecurity -- it also fundamentally makes many attack vectors impossible. This should be the goal -- to make exploits almost pointlessly difficult, even with a bug at hand. Even with 5 bugs it should be maddeningly difficult. For example, it completely stops refcount based overflows. PAX_MEMORY_SANITIZE would stop the most major attack vector of this particular bug -- the ability to write or invoke a function pointer through a UAF on the affected, allocated block. UDEREF and KERNEXEC stop almost every major userland/kernel-land cross execution attack, especially the most trivial ones which you see relatively often -- and it works on every platform you can think of (where SMAP/SMEP-like features are limited to people who have new computers, Intel only).

We're clearly trying the whack-a-mole game now. CONFIG_MODULES=n is just another version of it. It isn't working.

1 comments

I'm familiar with grsec (having an interest, albeit not as an expert, in some of the issues it addresses, and as a long-time Hardened Gentoo user), but that doesn't override my point that 'CONFIG_MODULES=n' seems neater than 'echo /bin/false > /proc/sys/kernel/modprobe'. If you only need a predefined set of modules and have no need to load them after boot, why not just disable the facility altogether?
Because nobody is actually going to set CONFIG_MODULES=n as any kind of "security measure", first off. Or really ever do it at all. If CONFIG_MODULES=n is a really completely separate issue to security and it's about "don't use what you don't need" -- why did we ever bring it up in this thread at all? Clearly there is some connection - the idea less code is better, right? Don't allow flexibility where none is needed. Principle of least power. But if your point is not "overridden" by the existence of grsecurity -- a clearly security focused piece of kit -- why are we talking about it at all in this thread? We might think these are mostly unconnected because "They are obviously doing different things, so they must be unconnected, they're just kinda like peas-in-a-pod!" But they are very much intertwined, I think, about the way we think of these things. Dig on that.

My argument is that it seems like doing this helps. It obviously means less LOC, right? So clearly it's strictly less attack surface. That's clearly better, no questions asked, right? I mean, it's not anti-memory-corruption. But that's why they aren't the same! So they aren't connected in that sense, right? But it isn't attacking the real root problem. The root problem is extremely important in this case, because millions and millions of computers are impacted by it. You can't kick-the-can forever.

You're saying "Why not just disable it", but you need to ask yourself another question first: will anyone actually do that, anyway? I mean, aside from massive nerds like us - with free time. Another question is still: Can the problem be solved without this, without requiring more than is necessary while covering every perceivable use? The answer is, aside from people running on their laptops, nobody at scale -- unless they have very specific needs or resources -- are going to do this. This ship is very long gone and sailed. And you can definitely solve this without hacking your kernel config.

It also does not address the root issue. So we all turn off CONFIG_MODULES in every distro. What's next? The next major weak point in Linux's infrastructure? Then everyone sets CONFIG_FOO=n in order to avoid everything until Linux rolls in 10-million more lines of code, then there's CONFIG_BAZ=n to set? This is why it's whack-a-mole. BTW: remember the other millions of devices _for which you cannot turn off CONFIG_MODULES_? You won't be able to turn off CONFIG_BAZ in 5 years either, I'm afraid. Because you won't be able to on your shitty router.

Let me put it this way: if Linux had a vulnerability in its ethernet stack and it seemed to be a source of problems continuously -- is the answer turning off the ethernet stack for everyone? No. It is to find the root cause.

If browsers are attacked relentlessly through vulnerabilities -- do we stop using the web forever and everyone just deletes their browser? No. Do we tell every user they're doing it wrong by "trusting the browser" when they should "know" software has flaws? No- we instead engineered browsers to be as resilient as possible in the face of a very, very hostile internet. Chrome is an example that we have made real progress here. There is more to do. Why do this? Because this is simply the way most people use the internet: with a browser. With an ethernet stack. With CONFIG_MODULES=y.

Try using the "5 W's" here:

Why are so many vulnerabilities here? What trigger mechanisms do people use to attack it? Where are they introduced? When are they exploitable?

The reality is: a lot of the answers to these questions have very little to do with module autoloading.

Maybe there are lots of vulnerabilities because the code was not extensively hardened, or designed around the scenarios it is used now (user namespaces are a good example). Perhaps people very often use the same trigger mechanisms for a payload (for example, overwriting a function-ptr struct, such as the one that takes care of module callbacks, like read() syscalls, etc) It might have been they were introduced long ago. It might be actively and reliably exploitable right now, or maybe it is very difficult to exploit in any realistic scenario.

These are all very common attack scenarios for Linux exploits, historically, over the past few years. Global function pointers overwritten trivially (because of no `__ro_init` until 4.8+). Simple payloads because there wasn't even a way to block `commit_creds` for a long time. Exact same triggers, like refcounts, UAFs, double frees.

This is beyond autoloading. It is a process problem.

Again, "rebuild your kernel with a config option changed" does not logistically scale for the vast majority of people. It's basically a non-starter. People will go with their distro kernel instead (as they very likely should, to be honest).

Just to be clear: I am not saying CONFIG_MODULES=n is bad -- by all means, turn it off. If it makes you feel good, speeds up your kernel builds -- whatever. But it does not really address the root problem here, so its suggestion is a red herring; simply turning off this stuff is just a bandaid, it doesn't address actual attack vectors. If you're bring up CONFIG_MODULES=n not as a security measure but as a way to just "reduce attack surface via LOC" -- these are the same thing! It's just not an actually good security measure. It doesn't scale. Again, it is a herring if it is not actually a "security measure".

I suppose my point is there's a bigger gap here. We think very "simplistic" things like this are good, because they "strictly must improve everything and are justified by that" but often they treat symptoms and not the real disease. But they don't really, actually stop the attacks. They "shake the table" a little, to quote someone on Twitter recently. They can even increase complexity and make real defenses more brittle.

FWIW: Here's an example of that, where these kinds of "soft" mitigations actually backfire - that SLAB randomization code is only a small obstacle for a skilled attacker (and most kernel attackers _will_ be skilled): https://github.com/torvalds/linux/commit/c4e490cf148e85ead0d... -- so, given this: why exactly are we paying the cost of these regressions, of half-baked fixes that do not stop SLAB-based exploits and real attackers? If my kernel is going to have a shitty, bizarre regression, I might as well get actual security out of it.