|
|
|
|
|
by Rochus
1011 days ago
|
|
I worked in safety-critical ATC projects in engineering and management positions (systems, quality and compliance engineering) for a decade. ATC systems are supposed to not fail, even under adverse conditions. Where high availability is required for safety reasons, redundant architectures is one of the options. Apparently the "backup system" was conceived for this purpose. According to the report (page 17) the responsible subsystem suffered from a "critical exception [..] that triggers the conditions that led to the incident", which let both the primary and backup system fail, and has now apparently been fixed. So obviously the system was not supposed to fail on receiving wrong or suspicious flight plan data, and it was apparently pure luck that no such data arrived for five years. To claim that the subsystem (consisting of the primary and backup system) "safely failed" indicates significant gaps in safety management (either faulty safety analyses, faulty specifications, or faulty configuration or software). The report suggests that critical omissions occured at several levels. |
|
The purpose of a backup system is not to prevent failure - it's to improve resiliency of the system as a whole across a set of foreseen and unforeseen faults. Backup systems failing to handle any specific fault is an expected and predicted behavior. Thankfully in this case there was a backup system that prevented a complete shutdown (and, thankfully, any accident) - the manual processing of flight plans.