|
>> the real failure in the story of the Therac-25 from my understanding, is that it took far too long for incidents to be reported, investigated and fixed. > the earlier (manually operated) version of the machine did have the same fault. But it also had a failsafe fuse that blew so the fault never materialized. #1 virtue of electromechanical failsafes is that their conception, design, implementation, and failure modes tend to be orthogonal to those of the software. One of the biggest shortcomings of Swiss Cheese safety thinking is that you too-often end up using "neighbor slices from the same wheel of cheese". #2 virtue of electromechanical failsafes is that running into them (the fuse blew, or whatever) is usually more difficult for humans to ignore. Or at least it's easier to create processes and do training that actually gets the errors reported up the chain. (Compared to software - where the worker bees all know you gotta "ignore, click 'OK', retry, reboot" all the time, if you actually want to get anything done): But, sadly, electromechanical failsafes are far more expensive then "we'll just add some code to check that" optimism. And PHB's all know that picking up nickles in front of the steamroller is how you get to the C-suite. |
About a decade ago a rep from Videojet straight up lied to us about their 30W CO2 marking laser having a hardware interlock. We found out when - in true Therac-25 fashion - the laser kept triggering despite the external e-stop being active due to a bug in their HMI touch panel. No one noticed until it eventually burned through the lens cap. In reality the interlock was a separate kit, and they left it out to reduce the cost for their bid to the customer. That whole incident really soured my opinion of them and reminded me of just how bad software "safety" can get.