UK air traffic control outage caused by bad data in flight plan

Y	Hacker News new \| ask \| show \| jobs

	UK air traffic control outage caused by bad data in flight plan (theguardian.com)
	24 points by orobinson 1020 days ago

6 comments

mytailorisrich 1020 days ago

"Nats said the failure was due to “an extremely rare set of circumstances” with two identically named but separate waypoint markers outsidethe UK’s airspace "

Sounds like a fairly common error case to check for although they say that it never happened before.

ID collisions are always something to check for, not least when the data are user inputs.

link

bell-cot 1020 days ago

And checking for ID collisions is generally extremely easy, both in code and compute.

And the ID collision was in user data which the air traffic system has to continuously accept during operations.

And it sounds like that data breaks down into individual flight plans - so it might be trivial to reject just one flight plan, and allow the rest to proceed.

BUT...doubtless the UK's flight control software came out of some multi-billion-pound government boondoggle. So we should be grateful that it doesn't crash planes into each other, or send innocent postal workers to jail for theft, and overlook these sorts of failures.

link

cameronh90 1019 days ago

Most bugs are pretty easy to solve once you know they're there.

It's all well and good to say that software just shouldn't have bugs, but that's pretty much an unsolved problem at this point. The NATS system has a relatively good track record, and even companies with exemplary engineering standards have occasionally had large system failures.

Let he who is without sin cast the first stone.

link

matthewmacleod 1019 days ago

I have no doubt that there are huge numbers of issues with many large-scale IT projects, but this sort of cynical and hyperbolic armchair analysis makes it even harder to have rational conversations that help prevent systems failures in the future.

Consider reading the actual initial Nats report https://publicapps.caa.co.uk/docs/33/NERL%20Major%20Incident... – this provides a bunch of interesting analysis and technical information.

I'm sorry for being mean about it, but it's a personal bugbear of mine when complex systems failures are boiled down to lazy analysis.

link

robjampar 1020 days ago

"reject just one flight plan, and allow the rest to proceed."

rejecting a plan wouldn't necessarily mean it doesn't exist/take off anymore, so that doesn't sound sensible

link

bell-cot 1020 days ago

Flight delays / cancellations / diversions (due to mechanical problems, weather, etc.) are a very regular thing - the airlines, ground crews, commercial pilots, and control towers have lots of experience with "Flight 1234 won't be taking off..." and "Flight 2345 is being diverted to...".

Or, if it's a "Bob owns a Cessna, and took off anyway" situation - well, Bob's license to fly a private airplane will probably be taken away. Maybe his Cessna, too. And (post-9/11) Bob could be spending some time in uncomfy little rooms with bars on the windows.

link

gbil 1019 days ago

To add, my personal experience from the air force is that flight plan rejection from Eurocontrol was business-as-usual situation so I'm also confused reading that the system instead of rejecting that one problematic plan, threw a white-towel

link

mytailorisrich 1020 days ago

I admit I have no idea how the system works but if there is an obligation to submit a flight plan in advance then there should also be a standard procedure not to let planes take off or enter airspace if they don't. At the very least there should indeed be a procedure to reject the flight plan even if flight cannot be stopped.

link

jebarker 1020 days ago

> "not to let planes ... enter airspace"

How?

link

Scoring6931 1019 days ago

Airspaces are already a thing in aviation, and pilots need to seek permission to transit between them. This is done by making the request to air controllers via radio.

https://skybrary.aero/articles/classification-airspace

link

Scoring6931 1019 days ago

With regards to postal workers: https://www.bbc.com/news/business-56718036

link

darkclouds 1019 days ago

It may well have happened before, but things like updates to new versions of a language, use of updated libraries, and other things which typically creep into new versions of programming languages to minimise bugs inevitably introduce bugs because they dont have a unit/component test to flag these things up. It also suggests different programmers have worked on the system.

link

nickdothutton 1020 days ago

Having read the actual report... insufficiently rigorous validation of inputs leads to discovery of corner case.

They could have probably found this sooner with either fuzzing or perhaps some sort of digital twin model.

Finally, there's no exit clause to reject a flight plan from an "upstream"? That is a worry.

link

zooFox 1020 days ago

If there's two waypoint markers that are named the same, how did the flight control and/or plane software know which one is being referred? Assuming closest, it would have had to special case for it already, no?

e.g. if I want to drive to Springfield, it needs to know which one out of 67 I'd like to go to...

link

abenbow 1020 days ago

Link to the report from NATS (PDF) https://publicapps.caa.co.uk/docs/33/NERL%20Major%20Incident...

link

speg 1019 days ago

My current project involves processing flight plans. I believe the company even helped build part of NATS. There must be something else going on to crash the whole system.

We get so many invalid flight plans from third parties (e.g., ForeFlight) that the system would never be up if we didn’t mark them as invalid and move on to the next.

link

ChrisArchitect 1019 days ago

Feels like a [dupe]

link

ChrisArchitect 1019 days ago

News from a week ago mostly, with a number of posts:

https://news.ycombinator.com/item?id=37320322

https://news.ycombinator.com/item?id=37328377

https://news.ycombinator.com/item?id=37320322

https://news.ycombinator.com/item?id=37312648

link

ChrisArchitect 1019 days ago

More earlier discussion over here: https://news.ycombinator.com/item?id=37401864

link