Hacker News new | ask | show | jobs
by amiga386 583 days ago
I'm not sure what you think is the insane failure mode?

The UK is part of the IFPS Zone, centrally managed by EUROCONTROL using AFTM. IFPS can accept/reject IFR flight plans, but the software at NATS can't. By the time NATS gets the flight plan, it has already been accepted. All their software can do is work out which parts enter the UK's airspace. If it's a long route, the plane has already taken off.

NATS aren't even thinking of a mixed-mode approach (for IFR flight plans) where they have both automated processing and manual processing of things the automated processing can't handle. They don't have a system or processes capable of that. And until this one flight, they'd never had a flight plan the automated system couldn't handle.

The failures here were:

1) a very unlikely edge case whose processing was specified, but wasn't implemented correctly, in the vendor's processing software

2) no test case for the unlikely edge case because it was really _that_ unlikely, all experts involved in designing the spec did not imagine this could happen

3) they had the same vendor's software on both primary and secondary systems, so failover failed too; a second implementation might have succeeded where the first failed, but no guarantees

4) they had a series of incident management failures that meant they failed to fix the broken system within 4 hours, meaning NATS had to switch to manual processing of flight plans

1 comments

But that's the thing, part of the plan was in a bad case we can switch to manual processing, but at no point someone thought to suggest manually processing one failing plan

I work with great QAs all day, and if one of them heard that there are duplicate area codes, there would be a bunch of test cases appearing with all the possible combinations