|
A few points: 1) failing loudly is better than failing silently. A memory corruption issue (or a bad refcount, etc.) is not a benign issue that only becomes relevant under carefully crafted exploit conditions. You need the carefully crafted exploit to get the system back into an attacker controlled state (I.e. code execution); by itself (with non-malicious inputs, usually something random or slightly atypical — enough to not have been noticed yet, but typical enough that some program does it) the system is likely to either panic immediately (same result as with pax) or to corrupt some memory, in which case you will have a lot of strange behaviour to track down later (users will probably blame them on hardware or on their user space, so you might never see them. for example a recent OSDI paper showed that ext3/4 had several real world data corruption bugs. If these aren’t as frequent as the recent bcache issues, no one notices). 2) When I was doing research projects (into memory defenses on the kernel) about 3 years ago, there was no (commonly used, that I saw) automated testing infrastructure in the kernel. This makes catching regressions, especially in drivers for rare hardware, hard to catch. While tests aren’t a panacea, i think Linux overestimates what fraction of problems Code reviews will catch. 3) the “don’t break user space” strategy is already failing. Every mainstream distribution and embedded vendor stays on an old kernel branch. Big deployments do staged rollouts and extensive burn in tests. This isn’t just because the kernel, but because of extensive abreaking changes everywhere (compilers, standard libraries, etc. all need to change sometimes).the last time this happened, IIRC it was some audio bug in a strange configuration. In my experience, running a non standard Linux audio confit causes countless breakages, so an additional one in the kernel that might save my personal data from being exfiltrated is worth it. Most users have average (and therefore well tested) setups, which means thy won’t see breakages as often. Perfect software doesn’t exist, and even MSFT backed off maintaining religious backwards compatibility (note that Microsoft’s approach was not to flame at developers and hinder new development, but through extensively building compatibility shims. Often, these came with trade offs strongly in favours or security, e.g. UAC). Breaking user space is ok; users already expect breakage, and the cost of the additional breakages is low (to users and to society as a whole) compared to the cost of security breaches [citation needed, but Linux kernel security is relied on in a lot of places]. |
Think about a smartphone - do most users want it to crash and reboot, even if some error (which could end up being a security issue) occurred? The answer is no, absolutely not. The crashing and rebooting itself isn't really that helpful. Reporting the bug to the Linux developers _would_ be helpful.
Some people do want the frequent crashing behavior and that's okay, but it's not okay to make that decision for everyone.
Also, users might expect minor breakage if someone somewhere makes a mistake, but that doesn't mean it's okay. That's like saying if someone always washes their hands before eating, it's okay if they get sick, because they were expecting that they might get sick.