Hacker News new | ask | show | jobs
by rjvehn 2554 days ago
There are a myriad of issues that put the planes at risk, but I think that the fact that when the control system (MCAS) is engaged causes it to ignore feedback is the biggest issue of all.

"But with the MCAS activated, said Fehrm, those breakout switches wouldn’t work. MCAS assumes the yoke is already aggressively pulled back and won’t allow further pullback to counter its action, which is to hold the nose down.

Fehrm’s analysis is confirmed in the instructions Boeing sent to pilots last weekend. The bulletin sent to American Airlines pilots emphasizes that pulling back the control column will not stop the action.

Fehrm said that the Lion Air pilots would have trained on 737 simulators and would have learned over many years of experience that pulling back on the yoke stops any automatic tail maneuvers pushing the nose down." [0].

If you bought a new computer, how pissed off would you be if you lost data not because of a hard-drive failure, but because of a weird design decision of the 1 penny caps lock key? Imagine spending the time to setup a proper RAID system and losing everything because of a design decision in the keyboard.

I mean if the media keeps reporting about the small stuff that's wrong, it's going to make people go "well planes are complex and things happen" and almost ignore the seriousness of a design decision that ignores user input.

[0]: https://www.seattletimes.com/business/boeing-aerospace/faa-e...

4 comments

To be fair, ignoring user input could have potentially have saved Air France 447... I mean I actually can't think of an automated fool proof system that would've fixed 447, but incorrect input was a major factor.
IIRC, the cockpit voice recording included a comment from one of the co-pilots about how pulling back on the yoke couldn't cause a stall. The assumption was that the Airbus's fly-by-wire system would prevent it and ensure the aircraft still climbed as long as the pilot held back on the stick.

The co-pilot apparently didn't realize that the sensor issue that disabled the autopilot also disabled the stall prevention. And that's despite an audible "STALL" warning being repeated in the background.

The captain was not in the cockpit when the whole situation started, but as he re-entered the cockpit during the stall he saw one of the co-pilots holding back on the yoke and told him to push the yoke forward to prevent the stall. The co-pilot followed the instructions, but only for a few seconds before pulling the yoke back again.

All of this is to say if the plane hadn't been known to ignore user inputs in most situations, the co-pilot might not have assumed the Airbus would do the right thing and climb no matter what when pulling back on the yoke. So in a sense, never ignoring user inputs might have also saved Air France 447.

Apparently (think I read in the Langewiesche feature) the plane ended up in such a deep stall that the flight control software started ignoring the AoA sensor data (as implausible) and the STALL warning stopped. But when the co-pilot stopped pulling back on the stick, the AoA decreased, and the STALL warning sounded again.

This might have convinced him that easing off on the stick was actually causing the stall, which was tragically misguided.

Exactly - the computer had switched contexts, but the pilot hadn't. And expecting pilots to switch their mental map of expected behaviour when the computer does (and did so with, from the accounts I read, very minimal indication that it had done so) during a high stress situation, is asking for trouble.
The indications were different alerts continuously sounding. It’s a complex problem, there’s a lot that can go wrong at once.
This is my number one objection to over-reliance on automation.

Every piece of software is a mechanism. In order to truly be able to safely use something without outside aid, one must have a complete mental map of the mechanics of the system in question. Abstraction helps; but not when you start getting into high-risk contexts.

One of the best explanations I have read on that issue. Short and accurately summing it up.
The copilot pulling the yoke back continued to do so, long after the other, much more experienced, copilot had formally assumed control and had attempted to bring the nose back down by pushing the yoke forward. Ultimately the inexperienced copilot fighting against his more experienced superior was what doomed the airplane. Both the senior copilot and the captain immediately identified the problem and attempted to take the correct action.

This is not a problem with how the system works, since this behaviour is explicitely communicated to pilots. It even says right on the instrument panel what control law the plane is in. There are only a handful of control laws and the differences aren't that complex. Anyone with sufficient experience in flying Airbus products knows this.

I don't know a whole lot about this, but I seem to remember that there was one design decision, that, while not wrong, was different from the generation before, and that is that the airplane yokes were not mechanically coupled to one another. If they were mechanically coupled, the experienced pilot could have felt the other pilot pulling back on the controls, but what was happening was that the two pilots were pulling the controls in different directions AND the plane was averaging the control inputs and giving no feedback to the pilots that what each was doing was wildly inconsistent or contradictory.
>The copilot pulling the yoke back continued to do so, long after the other, much more experienced, copilot had formally assumed control and had attempted to bring the nose back down by pushing the yoke forward.

At least according to the official accident report, neither of the pilots at the controls consistently made nose down stick inputs.

Basic rudimentary 'stick and rudder' flying skills was a big factor in AF447's crash. All old school pilots know that when you aircraft is in a nose high stall condition, you never keep pulling back on the stick, but instead push it forwards to lower the nose and get the wings flying again.

The fact that the co-pilot in question kept holding the stick to the back stops was the main reason that the aircraft wallowed into the sea. Weirdly, he did let go of the stick for a brief few seconds, which was the only time during the harrowing descent that the aircraft started to behave normally, but then he pulled it back and held it back right up until impact.

Yep, the aircraft could have ignored these inputs, but the inputs are counter to what any reasonably skilled pilot would have done. (Note: Different to the MAX crashes where pulling back on the stick under speed IS the accepted way to stop a descent.)

Part of the issue may have been that the plane had slowed down so much that the stall warning stopped (it disengages below a certain airspeed apparently). When he stopped pulling up, the plane sped up and the stall warning started again. Pull up again, plane slows down, stall warning stops.
I wonder if something about this system was changed after that incident - why not keep sounding the stall alarm if the plane ends up outside the flight/sensor envelope? Can’t you assume that it didn’t magically cross the stall zone back into normal flight?
No, because an equally (probably more) likely scenario is that the relevant sensors are giving bad readings.
Basic rudimentary 'stick and rudder' flying skills was a big factor in AF447's crash. All old school pilots know that when you aircraft is in a nose high stall condition, you never keep pulling back on the stick, but instead push it forwards to lower the nose and get the wings flying again.

Except on an Airbus. If the plane is in "normal law", it won't go into a stall condition. Here's the Airbus training video.[1] Note, by the way, that the automatic recovery includes going to full throttle. The throttle levers don't move, though. Unlike Boeing, where the levers are moved by the computers and the pilot can overpower that. In the 737 Max, though, it's worse, because the engines are mounted too high and full thrust pushes the nose down. So "full power and back off on the stick" will not work.

[1] https://youtu.be/G161aMYCzbQ?t=100

The engines in the MAX are still producing thrust below the centre of mass of the plane...how would this produce a nose down pitch at TOGA thrust?
Sorry, backwards.
>The fact that the co-pilot in question kept holding the stick to the back stops was the main reason that the aircraft wallowed into the sea. Weirdly, he did let go of the stick for a brief few seconds, which was the only time during the harrowing descent that the aircraft started to behave normally, but then he pulled it back and held it back right up until impact.

This description isn't consistent with what's in the accident report. Where are you sourcing it from?

AF 447 wasn’t all that different from this situation. One of the co-pilots was trying to pitch the nose down to recover from the stall. The other was panicking and trying to pitch up. The plane averaged their inputs, without giving feedback via the stick that this was happening. It wasn’t until very late in the flight that they figured out what was happening, and then it was too late to recover.

Obviously there was some significant pilot error in this case, but a big contributor mag have been that the pilot who was trying to correct the stall didn’t understand that the plane was ignoring his input because of the averaging.

I don't the flight control was averaging the pilot inputs.

From this link: https://en.wikipedia.org/wiki/Air_France_Flight_447#Human_fa...

In April 2012 in The Daily Telegraph, British journalist Nick Ross published a comparison of Airbus and Boeing flight controls; unlike the control yoke used on Boeing flight decks, the Airbus side stick controls give little visual feedback and no sensory or tactile feedback to the second pilot.

Ross reasoned that this might in part explain why the pilot flying's fatal nose-up inputs were not countermanded by his two colleagues.

In a July 2012 CBS report, Sullenberger suggested the design of the Airbus cockpit might have been a factor in the accident. The flight controls are not mechanically linked between the two pilot seats, and Robert, the left-seat pilot who believed he had taken over control of the aircraft, was not aware that Bonin continued to hold the stick back, which overrode Robert's own control.

That suggest there was only ever one pilot flying and the way that pilot reacted to the situation had a big part to play in the final crash.

> That suggest there was only ever one pilot flying

"Pilot flying" is a human-factors title, not a software function-lock. It just indicates who has control responsibility at that moment but it is not enforced by technical means.

It is intended to eliminate ambiguity in crew functions; the PF can be a newbie copilot even if the commander of the aircraft is a 30-year-service Captain who would become the PNF at that point. Its all part of Crew Resource Management theory.

There should only be one PF in a cockpit at any one time, precisely to avoid the situation that arose with the Air France flight where the computer was receiving inputs from two pilots.

I was responding to the claim the flight control was averaging the two pilot inputs, because if that was the case then two pilots would have been flying the plane.

Might point was I doubt that this was in fact happening and there was only ever one pilot in charge.

> the Air France flight where the computer was receiving inputs from two pilots.

The link and quotes I posted suggest that was not happening.

The system was just ignoring the other pilot (and that was the designed fault) because it also failed to tell that other pilot he was being ignored.

>because it also failed to tell that other pilot he was being ignored.

It didn't fail to tell him. That's what the dual input alarm is for.

You may be right about the averaging. From rereading the accident report, the Pilot Flying took back control of the plane after the Pilot Not Flying engaged his controls and tried to pitch down.

But, it’s the same basic idea. The PNF thought he’d gotten control of the plane, and didn’t understand why his input wasn’t having an effect. He didn’t get feedback from the stick telling him a different input was being honored. And neither pilot appears to have been fully aware that they were in a flight control mode where there was a risk of stalling. The PF especially never seemed to have made that connection, and the PNF took a fairly long time to call it out. As a result, the PF may not have been aware that he needed to actively keep the angle of attack inside the flight envelope.

So, PNF tries to pitch down, but isn’t aware the plane got put back into a mode where he isn’t in control. PF is pitching up, but isn’t aware the plane switched to a mode where this could lead to a stall. That’s the similarity I was getting at.

It’s weird to me how persistent this story is.

From the reported control traces, there was no prolonged period of dual input. There were 3 or so brief moments of dual control input (1 - 2 seconds), during which a warning was sounded. The pilots never spoke out loud about it, but we can infer that they heard the dual input warning and were aware when it happened because the sequence of events was the same each time; inputs from both joysticks received -> aural dual input warning -> input from one joystick stops.

Something about the idea of two pilots inadvertently fighting each other for control of the aircraft has definitely caught peoples’ imagination. But it didn’t happen.

100% correct.

The available evidence suggest this averaging thing never happened and certainly was not the cause of the crash.

What we have is situation of two pilots in close proximity to each other not communicating and the captain unfortunately caught in the toilet.

The incorrect user input on AF447 happened AFTER all of the automatic systems had failed due to sensor clogging. How could ignoring user input have helped the flight when the plane's computer giving up was the cause of the manual takeover in the first place?
Yes, I guess that's where I say I don't know of a fool proof system, and then yeah how would it know it was incorrect input. I was simply saying incorrect input, given the actual situation, was an issue.
I'd say the improper input was the most direct factor, as it was responsible for the stall condition all the way into the ocean.

But there are multiple major factors leading up to that, including the lack of high altitude training in direct law, and that the simulator didn't exactly simulate high altitude stalls, and that the stall warning stopped when the angle of attack was beyond the sensor limit. All of these things are major and the final report really sank a lot of blame on Airbus and Air France as well as pilot startle effect basically stopping their brains from working the problem. The senior pilot who arrived didn't have that, and quickly figured out the source of the problem but by then it was too late, not enough altitude to recover.

After so many decades, there's no golden rule here ? (genuine question).

A principle of zero automation fallback in case of confusion ? something that is hardcoded deep in the design so that people in charge (pilot crew) can know for sure that whatever happens is in their hands ?

It was designed to trim down when the pilot is pulling up. Of course pulling up is not going inhibit it. That's the point of the system!
Of all the many problems of the Boeing 737Max situation, and there are several, for once I don't think the media reporting is one of the biggest. But, your basic point stands.
The root problem is the culture at Boeing and the FAA has shifted from safety first to profit first.

The investigative reporting from The Seattle Times[0] indicates that safety engineers were pressured to avoid delays to rush out a competitor to the A320. Furthermore, their safety analysis was based on flawed assumptions to meet an artificial constraint of not requiring pilot simulator training in order to appease the airlines they were selling to. Finally, the FAA is allowing industry to self-certify critical systems with lax oversight.

It is easy to get lost in the technical details of why a particular catastrophe happens. The common throughline is a broken culture where deviance is normalized and those who speak out are ignored. It's the same story with Chernobyl, Fukushima, the El Faro, the USS Fitzgerald and USS John S. McCain, Air France 447, and now the 737 Max.

[0] - https://www.seattletimes.com/seattle-news/times-watchdog/the...

"The common throughline is a broken culture where deviance is normalized and those who speak out are ignored."

The must read on the issue says so too.

"The Seven Signs of Ethical Collapse: How to Spot Moral Meltdowns in Companies", Marianne M. Jennings

Thanks for the steer. Wasn’t aware of this. Just read her presentation and loved it.
Fukushima? That doesn't belong on the list. There is some limit to any engineering decision. Complaining about MCAS is totally reasonable, but it would be unreasonable to argue "The Air Max is not safe because if I hit it with enough Stingray missiles it won't fly anymore." Like, yeah? No kidding?

Fukushima was designed to survive the earthquake, and it did, it just wasn't designed to survive the earthquake and also the tsunami.

Fukushima survived the earthquake and even survived the tsunami. The generator got wiped out, but even that wasn't what ultimately led to the disaster. It was that the battery backup eventually ran out of power (not unexpected) and the connectors for recharging it were old and of a format that isn't used any more. There was no way of recharging the battery backup and so the pumps eventually failed.

It's one of those problems where there are literally a million things that could go wrong and since the emergency system is not used normally, it's easy to overlook a critical problem.

So I agree with you. Fukushima was not a design error -- or at least not a design error that could have been reasonably fixed at the time that the reactor was originally designed. It was an error in maintenance. Obviously better to have a design where loss of power doesn't cause a melt down, but I don't think that these were available when Fukushima was built. CANDU reactors existed at that time, but I think they were still considered experimental. Pickering came online in 1971, so basically at the same time as Fukushima. I'm not familiar with other passive designs, so possibly someone else can make an observation.

But basically, as far as I can tell, Fukushima was a reasonably normal nuclear power plant for the time it was designed. The Air Max seems to have suffered from problems because of design decisions that are not considered normal.

> since the emergency system is not used normally, it's easy to overlook a critical problem.

This is a such an important antipattern when robustness is a goal.

Totally agree. Done it myself more times than I care to admit. One small quibble, if I may. Originally "antipattern" used to mean something that looks like a good design pattern, but will actually bite you in the end if you used it as intended. This is not so much an anti-pattern as it is an unfortunate reality (you have to maintain compatibility with external interfaces for the length of the project). How much bit rot have I seen in my career?
But it could have been quite easily by simply siting the backup generators above ground. That was a stupid design error. Tsunamis are not unknown in Japan after all.
I refrain from using “simply” or “just” unless I am the person expected to design or fix the problem. Ahead of time. Saying after a disaster caused by the most powerful earthquake ever recorded in Japan[0] that the solution was “simply” to do some coincidentally simple-sounding thing is not credible.

[0] https://en.m.wikipedia.org/wiki/2011_Tōhoku_earthquake_and_t...

Yeah, this is something that I think doesn't really resonate with people well. The reactor site is 25 meters above sea level. I'm not exactly sure how high the generators were, but they were well above the level that experts thought was safe at the time. The earthquake was a 1 in 1000 year event and so there was no data on record to help them model the resultant tsunami. In the years following the tsunami, the way people modelled waves radically changed based on the new data.

There are a couple of caveats. First, there were markers saying that an historic tsunami had come in much higher than models would have predicted. However, the are very old. It's just a rock stuck in the ground with some writing on it. Stuff like that is all over Japan (there are lots of markers around where I live -- I don't think anybody pays any attention to them at all. Probably we should, but usually they just mark boring stuff ;-) ). It's like seeing a roman road marker in Europe. Interesting, but not really note worthy. It's only after the tsunami that people saw the markers and said, "Holy cow. There's a marker here showing that a tsunami came up this far". Even then it's a far cry from seeing that to saying that we need to invalidate all our wave building models.

Secondly, I think there is some evidence that in a few years preceding the tsunami that researchers were getting worried that their wave models were not correct. I think it's even the case that nuclear plant companies were aware of this. When I first moved to Japan in 2007, there used to be a section of the Meteorological Agency of Japan that showed, among other things, a map of the farthest in a tsunami would theoretically go for all parts of Japan. It also listed maximum wave size for every single place along the coast. It noted places where sea walls were not high enough and estimated worst case damages and numbers of casualties. Around about 2009 it disappeared. I tried to find out where it went and the response I got was that it needed to be updated and that it would return at some point in the future. Of course, it never came back. At the same time, I've heard that literally a few years before the Tohoku earthquake that there was serious debate about whether or not the wave models were correct. However, I think it's pretty clear that in 1967 when they started construction at Fukushima they had absolutely no idea that they were building in a potentially unsafe area.

It really sucks and I think it's fair to say that as humans we probably have too much hubris when it comes to our science. That fact that you have no reasonable way of knowing that you are making a mistake doesn't mitigate the problems that result from that mistake.

> Fukushima was designed to survive the earthquake, and it did

untrue

It was designed to survive both a tsunami and an earthquake. Tsunamis often are caused by earthquakes.

That Fukushima survived the Earthquake is a myth. The plants had an emergency shutdown and there was very little time for a damage assessment, which would have taken weeks or months.

Whether the plant would ever have been restarted after the earthquake is unknown. It could have been a full loss, like several reactors in Japan, which will never be restarted.

The plant lost electrical connection to the grid, of course it had an emergency shutdown. Otherwise they'd have had to have found some other method of dissipating megawatts of electricity.

The fact that other plants have not been restarted is at least as likely to be political as it is technical.

> The plant lost electrical connection to the grid, of course it had an emergency shutdown. Otherwise they'd have had to have found some other method of dissipating megawatts of electricity.

A nuclear power plant always has an immediate shutdown in case of a strong earthquake:

'Japanese nuclear power plants are designed to withstand specified earthquake intensities evident in ground motion (Ss), in Gal units. The plants are fitted with seismic detectors. If these register ground motions of a set level (formerly 90% of S1, but at Fukushima only 135 Gal), systems will be activated to automatically bring the plant to an immediate safe shutdown.'

http://www.world-nuclear.org/information-library/safety-and-...

> The fact that other plants have not been restarted is at least as likely to be political as it is technical.

The words 'likely to be political' is no category in nuclear safety.

Take this example from 2008:

http://www.world-nuclear-news.org/C/Tepco_counts_earthquake_...

'Tepco's announcement yesterday included a section dedicated to the effects of the magnitude 6.8 Niigata-Chuetsu-Oki earthquake, which violently shook the Kashiwazaki Kariwa nuclear power plant on 16 July 2007. All seven of the reactors remained safe during the event, which caused huge damage to the region and several deaths. However, checks to establish the units' safety to return to service are proving very lengthy, and could continue into the latter part of 2008.

The ongoing inspections at Kashiwazaki Kariwa are to cost ¥122 billion ($1.13 billion) in FY2007. In addition, ¥25 billion ($233 million) will go on civil engineering repairs while a geological survey of the site is to cost a total of ¥2 billion ($18 million).'

Just the inspections after a safe shutdown for that nuclear power plant did cost more than 1 billion USD...

Though one might argue that the risk of tsunami is not independent of the risk of (certain kinds of) earthquake for pacific rim nations. Failure to take that into account might be considered a design decision.
There were enough reports over the years that were buried which alerted about the earthquake and tsunami risk.
> The root problem is the culture at Boeing and the FAA has shifted from safety first to profit first.

So the same problem that pervades society everywhere now? I’m not sure if that wasn’t the case before, but it feels to me that people previously wanted to make lots of money by building great products, and they’ve just left the ‘building great products’ part behind.

There are still companies that do that; the ones I'm aware off are mostly from Germany and Japan. Like assembly line robots, but also Panasonic (and maybe Fujitsu; haven't tried them for a while, but I used to be a big fan of their 2-in-1's P1510 rang) laptops (especially the Japan-only ones). They would not sell anywhere else because they are crazy priced, but they are virtually indestructible and go on forever.
Despite the 737 Max fiasco, airplanes today are far safer than ever before. Since 1970, annual deaths have been cut by >80%, while air traffic has increased by a factor of 10.

Cars have seen similar improvements. So have food hygiene, workplace safety, and most any measurable safety record I can think of.

You are using statistics wrong here, FAA process changed , Boeing entered a panic mode so their process changed too, applying statistics from the past where processes were different(FAA did its job and Boeing was no cutting corners) is incorrect.

We need new statistics, and this are the stats for the recent Boeing plane where the MAX crashed 2 times, first crash was blamed on the pilots and no serious urgent investigation was performed, the MCAS issues that were discovered were trivialized, still Beoing is trying to shift t6he blame on a bad software and not on the actual causes .

My point is that you can't use statistics that way, there are rules on how to apply them correctly and many pitfalls when applying statistics in real world scenarios.

No, I'm using statistics exactly as intended. Last year was much safer than any year in the 20th century, and so will this year be.

You have some ideas about how everything used to be better in the past, and you're trying to hold onto them in the face of overwhelming evidence to the contrary.

There isn't some sudden increase in the pressure to earn money that wasn't there in, say, 2008. And while the 737 Max process was obviously flawed, the argument above was that somehow there are fundamental problems across companies and industries, not just a single model. Quote from above:

> So the same problem that pervades society everywhere now? [I]t feels to me that people just left the ‘building great products’ part behind.

While evidence of a breakdown in Boeing's ability to design safe planes would indeed lag, there are many other critical processes where regressions would show rather quickly, such as maintenance, fuel quality, air traffic control, IT security, etc.

That is exaggerated. It doesn't pervade everywhere.
The difference is, most products aren't safety critical like aeroplanes.
The problem was that because it was an unknown component its failure mode was not known, which created/exacerbated the panic.

There is a simple procedure, which is already part of the standard memory checklists. What to do in case of runaway trim. The pilots must/should be able to notice the trim wheels spinning, they then can disable trim motors and fall back to manually cranking them.

The problem is, panic makes a mess of almost anybody. Sure, pilots shouldn't be anybody, but we know how much cost cutting has been going on.

Well also the fact that the stabilizer can get trimmed far enough that the aerodynamic load on the rear stabilizer exceeds the pilots ability to move it manually. In that case the only real solution early in take off would be to re-enable the trim system so you could use the trim motors. The other solution to that situation is to nose down to release some of the pressure on the stabilizer so they can be manually cranked again, this isn't really an option early in the flight though because you don't have the altitude to slowly undo the erroneous trim while diving.
I wasn't aware, and seemed quite ridiculous that there was not enough torque to rotate the screw, but looking at it, the angel of the helical threading is pretty steep.

https://www.netairspace.com/photos/N37474/United_Airlines/Bo...

And it turns out this can/happens even with the electronic motorized way, and there is a maneuver to work around the load. But it got removed from the manual...

https://www.pprune.org/tech-log/619326-boeing-advice-aerodyn...