Hacker News new | ask | show | jobs
by WalterBright 47 days ago
> Everything is "simple" with hindsight in mind.

The fixes are still simple and cost little.

I used to work at Boeing on airliner design. The guiding principle is "what happens when X fails" and design for that. It is not "design so X cannot fail", as we do not know how to design things that cannot fail. For Fukushima, it is "what happens if the seawall fails", not "the seawall cannot fail".

Airliners are safe not because critical parts cannot fail, but because there is a backup plan for every critical part.

Venting explosive gas into the building seems like a complete failure to do a proper failure analysis.

1 comments

>at Boeing on airliner design. The guiding principle is "what happens when X fails"...Airliners are safe not because critical parts cannot fail, but because there is a backup plan for every critical part.

And yet creating a culture that is vigilant and consistently applies due diligence is hard. To that point: Boeing identified the 737-Max MCAS as 'hazardous' in their analysis. Putting aside that 'catastrophic' was the more appropriate rating, they still did not appropriately design their system when that system failed. (By their own processes, 'hazardous' meant it should not be designed with single-point hardware failures)* That implies it is as much a human/cultural issue as a technical one.

* before any claims that the system was designed just fine because the pilots could have avoided the issue with the appropriate actions, those are administrative hazard mitigations which are generally considered less desirable than hardware fixes, especially when engineering mitigations are already installed but not used. Removing the hazard >> engineering controls >> administrative controls >> PPE. To the GPP point, hindsight is easy, managing risk, people, and processes is hard.

The backup for MCAS was simple:

1. restore normal trim using the thumb switches (which override MCAS inputs)

2. turn the trim system off.

The proof of that is that's what the crew did in the first MCAS incident, and they landed safely.

Check the previous note I left above with the * on why that is considered a poor mitigation.

Administrative procedures are bad mitigations in general but especially bad when a) it’s a safety critical issue and b) the hardware for an engineering mitigation is already installed. That’s like saying death could have been avoided if people would have just packed parachutes (PPE). Maybe true, but bad hazard mitigation.

I do understand your point, and the MCAS system needed improvement.

But still, dealing with runaway stabilizer trim is a basic thing every pilot needs to know. 1 crew did it, and proceeded normally and safely. Two other crews did not follow emergency procedures, and paid the ultimate price. After the first crash, Boeing sent around an Emergency Airworthiness Directive reiterating the procedure. The Egypt Air crew did not follow the procedure.

The reason the stab trim cutoff switch is prominent on the center console is because it is a very important switch.

I've also talked to 737 pilots and another who emailed me about it and confirmed that they considered those crashes as pilot error.

Nevertheless, I agree that the MCAS system was deficient.

The main reason I don’t consider it pilot error is because the pilots did not get training on the system. So the proper mitigation required quick understanding of a system they did not know about, which is incredibly difficult with an intermittent failure like MCAS. If it was identified too late, the force required for trim was too great to be applied manually. Expecting that knowledge and timing to be in place is why it’s not a reliable mitigation when there is no training.

There are lots of proximate causes, but the lack of training to avoid a new airframe certification is a huge one.