|
|
|
|
|
by jfim
2211 days ago
|
|
Not to detract from your point that aeronautical industry software is reliable (it is), but the 737 MAXes that crashed were all new planes. There wasn't even 24 months between the first delivery of a MAX to the model being grounded. The issues with the MAX were also clearly preventable and there were multiple failures of the systems (regulators, internal reviews, etc.) that were in place to catch these kinds of issues. But as you point out, the aeronautical industry has an excellent track record for software reliability, if you evaluate reliability by hull losses. By other metrics, it's a bit more debatable (eg. the integer overflow for Dreamliners such that they need to be restarted at least every 248 days), but still keeps people moving safely. |
|
My primary point is that many people look at these failures and incorrectly conclude that the processes in place are objectively terrible and below average. This leads to them discounting the processes in these systems in favor of policies from vastly less reliable systems that they think are quality-focused or "best practices" because they, fairly, think "bad" in a safety-critical context means the same as regular "bad", so regular "amazing" is clearly better. In truth, "unconscionable deathtrap" and "gross criminal negligence" in the airplane world is more of a synonym for "amazing beyond belief" in the rest of the software industry. The correct takeaway is understanding that regular "amazing" is actually orders of magnitude worse than "unconscionable deathtrap" and is thus completely inadequate for the job. As a corollary, if you do not think you are doing "way better than amazing" you are probably not doing an adequate job in these contexts.
To reiterate, the solutions are really really good, unfortunately the problems are really really really really hard.