| HN Mirror

This isn't wrong per se, but rather, it lacks concrete recommendations for what should be done differently.

I would love to see Linux thoroughly and meaningfully tested. For some parts it's just... hard. (If anyone wants to get their start writing kernel code, have a crack at writing some self-tests for a component that looks complicated. The relevant maintainer will probably be excited to see literally anyone writing tests.)

For this particular bug, the cheapest spot to catch the issue would have been code review. In a normal code base, the next cheapest would have been unit testing, though, in this situation, that may not have caught it given that the underlying bug required someone to break the contract of a function (one part of Linux broke the contract of another. Why did it not BUG_ON for that...).

Eliminating the class of issue required fairly invasive forms of introspection on VMs running a custom module. Sure, we did that... eventually.

Finding it originally required stumbling on a distro of Linux that accidentally manifested the corruption visibly (about once per 50ish 30 minute integration test runs, which is pretty frequently in the scheme of corruption bugs).