Hacker News new | ask | show | jobs
by dragonwriter 3036 days ago
As I gather a the Patriot was a mobile anti-aircraft / anti-cruise missile platform that was meant to move, be activated when needed, and then be turned off and move again because the original location was expected to become a target. It was pressed, on short notice (with some software upgrades, but not the normal cycle of specs, development, and validation that would go into that kind of repurposing) into stationary, continuous coverage, anti-ballistic-missile (critically, dealing with much faster targets than originally envisioned, which means short warning times where deactivations have a lot more risk) use.

So, while it's horrible in results, it can be very easy to understand why basic functions would have specs not at all adapted to the use to which it was being put.

1 comments

There's a distinction to be made, though. There was no requirement that it be rebooted after some period of time, though there was an expectation that this would happen by the original developers. Consequently it was not evaluated for 20 hour or 100 hour performance. That's a critical distinction in developing, testing, and fielding systems. And the way we term it in our requirements documents reflects this. We rarely say: System SHALL fail after some period. Rather we say: System SHALL perform for some period. We leave the result of longer durations undefined. The system may work, or it may not, we aren't required to test it and so we don't. If the customer wants it to run longer, we can evaluate it but they have to communicate that back to us (or to the testing facilities, which may not be the developers).

Similarly, with regards to the speed of the missiles, the requirement would not be: System SHALL fail to detect missiles above some threshold speed. But rather: System SHALL detect missiles below some threshold speed. This leaves open the possibility that it may be more or less accurate outside that range. It should be documented for the operators as a potential for failure: System may be ineffective against missiles operating above X m/s. But the requirements wouldn't include that detail.

This pushes the problem into the documentation and training. Since it was originally designed as a mobile platform with short run-times, there was no explicit operating procedure requiring reboots. It was just assumed. At the same time, the failure itself (after 20 hours) was unknown because testing hadn't been done to see what would happen.