Hacker News new | ask | show | jobs
by jbangert 3137 days ago
A few points:

1) failing loudly is better than failing silently. A memory corruption issue (or a bad refcount, etc.) is not a benign issue that only becomes relevant under carefully crafted exploit conditions. You need the carefully crafted exploit to get the system back into an attacker controlled state (I.e. code execution); by itself (with non-malicious inputs, usually something random or slightly atypical — enough to not have been noticed yet, but typical enough that some program does it) the system is likely to either panic immediately (same result as with pax) or to corrupt some memory, in which case you will have a lot of strange behaviour to track down later (users will probably blame them on hardware or on their user space, so you might never see them. for example a recent OSDI paper showed that ext3/4 had several real world data corruption bugs. If these aren’t as frequent as the recent bcache issues, no one notices).

2) When I was doing research projects (into memory defenses on the kernel) about 3 years ago, there was no (commonly used, that I saw) automated testing infrastructure in the kernel. This makes catching regressions, especially in drivers for rare hardware, hard to catch. While tests aren’t a panacea, i think Linux overestimates what fraction of problems Code reviews will catch.

3) the “don’t break user space” strategy is already failing. Every mainstream distribution and embedded vendor stays on an old kernel branch. Big deployments do staged rollouts and extensive burn in tests. This isn’t just because the kernel, but because of extensive abreaking changes everywhere (compilers, standard libraries, etc. all need to change sometimes).the last time this happened, IIRC it was some audio bug in a strange configuration. In my experience, running a non standard Linux audio confit causes countless breakages, so an additional one in the kernel that might save my personal data from being exfiltrated is worth it. Most users have average (and therefore well tested) setups, which means thy won’t see breakages as often.

Perfect software doesn’t exist, and even MSFT backed off maintaining religious backwards compatibility (note that Microsoft’s approach was not to flame at developers and hinder new development, but through extensively building compatibility shims. Often, these came with trade offs strongly in favours or security, e.g. UAC).

Breaking user space is ok; users already expect breakage, and the cost of the additional breakages is low (to users and to society as a whole) compared to the cost of security breaches [citation needed, but Linux kernel security is relied on in a lot of places].

2 comments

So one of Linus' main points in this series of posts is that failing loudly is actually not always better than failing silently or quietly, and it's really annoying when people come in making that assumption without thinking. This is also something that he is constantly repeating and ranting about, and it's arguably one of the reasons why Linux is so successful.

Think about a smartphone - do most users want it to crash and reboot, even if some error (which could end up being a security issue) occurred? The answer is no, absolutely not. The crashing and rebooting itself isn't really that helpful. Reporting the bug to the Linux developers _would_ be helpful.

Some people do want the frequent crashing behavior and that's okay, but it's not okay to make that decision for everyone.

Also, users might expect minor breakage if someone somewhere makes a mistake, but that doesn't mean it's okay. That's like saying if someone always washes their hands before eating, it's okay if they get sick, because they were expecting that they might get sick.

Interesting that you mention smartphones, because that is exactly what Google has made to their Linux fork.

Every Android app that misbehaves, just gets killed without warning.

The scenarios where this might happen, have been increasing since Android 7.

> Breaking user space is ok; users already expect breakage, and the cost of the additional breakages is low (to users and to society as a whole)

Which is why everyone loves rolling releases so much that Windows 10's forced upgrades are universally praised and Linux Desktop has a dominant market share.