| > In that scenario, your surgeon would see the program suddenly freeze. Only if the programmer or his management were incompetent. The display routine should be running on a separate thread than the processing code. No whole program freeze should occur. As for displaying random data, why would the programmer want to do this? Either display nothing or the last readings WITH a message that it's not real time. It's not the same as a crash! A crash requires 5 minutes minimum guaranteed. Restarting instantly after telemetry returns can happen under a second in the best case which can be the difference between a live and dead patient. > If your environment fails, there's nothing you can do to recover. Planes aren't designed to survive the loss of a wing. Why is this case any different? There are different kinds of failure. Permanent and transient. Following the permanent procedure for a transient case can be fatal. Take your airplane example. Loss of a wing is permanent. That would be like the CPU failing or an external cable being cut. But your engines shutting down can be permanent or transient. Just like disk I/O failing. You'd use the transient procedure in this case. Keep trying to restart the engines. If they restart, great! You've just saved the plane. Same with the disk I/O. The programmer should keep trying to restart the I/O. If it comes back, great! You've just saved the patient. |
Moreover, this is the kind of thing that should come up in robustness testing. Things should get bumped and wiggled. They should get unplugged and turned off. If the software is really going to run on random Windows boxes, then it should be tested on random Windows boxes. (At which point somebody will hopefully say, "Wow, this sucks, let's make it an appliance.")
No matter what happens, it shouldn't result in a "mysterious crash right in the middle of a heart procedure when the screen went black and doctors had to reboot their computer".