|
You are correct, but airplane companies already do that for the most part and much much more. The difference in reliability between normal software and airplane software is so vast that "best practices" from normal software can not be applied to airplane software since that would be gross criminal negligence. To explain, in the 10 years prior to the 737-MAX problems there were 50,000,000 flights and software was not implicated in a single passenger air fatality. The average flight is ~5,000 KM which is ~4-5 hours. So, in ~250,000,000 flight-hours, there were two crashes due to software. A plane takes ~3 minutes to fall from cruising altitude, so we can model this as a downtime of 6 minutes per 250,000,000 hours which gives us an downtime of 1 in 2,500,000,000 or a 99.99999996% uptime (yes, that is 9 9s). In contrast, I think most software people would agree that AWS is high quality. The AWS SLA specifies a 99.99% uptime (1 in 10,000 downtime). So, by this metric, airplane software is 250,000x more reliable than normal high quality software. The point of this is that the standard for airplanes is almost inconceivably high compared to normal software. To think that they are incompetent or suggest that all they need to do is adopt X or Y common-sense/best-practice is a gross misunderstanding of what is being done and what needs to be done to improve. It would be like someone trying to tell a civil engineer making a 50-story skyscraper that they really need to adopt high quality wood construction techniques from makers of doghouses. To actually improve it, you need to consider practices 250,000x better than "best practices" and go from there. To put it another way, the solutions are actually really really good, unfortunately the problems are really really really really hard. |
The issues with the MAX were also clearly preventable and there were multiple failures of the systems (regulators, internal reviews, etc.) that were in place to catch these kinds of issues.
But as you point out, the aeronautical industry has an excellent track record for software reliability, if you evaluate reliability by hull losses. By other metrics, it's a bit more debatable (eg. the integer overflow for Dreamliners such that they need to be restarted at least every 248 days), but still keeps people moving safely.