|
|
|
|
|
by skrtskrt
487 days ago
|
|
a bug taking a year to track down is a negative indicator of the quality of project maintenance, not the person who contributed the bug, whether it's due the code itself or the tooling and testing environments available to verify such important issues. |
|
I would love to see Linux thoroughly and meaningfully tested. For some parts it's just... hard. (If anyone wants to get their start writing kernel code, have a crack at writing some self-tests for a component that looks complicated. The relevant maintainer will probably be excited to see literally anyone writing tests.)
For this particular bug, the cheapest spot to catch the issue would have been code review. In a normal code base, the next cheapest would have been unit testing, though, in this situation, that may not have caught it given that the underlying bug required someone to break the contract of a function (one part of Linux broke the contract of another. Why did it not BUG_ON for that...).
Eliminating the class of issue required fairly invasive forms of introspection on VMs running a custom module. Sure, we did that... eventually.
Finding it originally required stumbling on a distro of Linux that accidentally manifested the corruption visibly (about once per 50ish 30 minute integration test runs, which is pretty frequently in the scheme of corruption bugs).