Hacker News new | ask | show | jobs
by sillysaurus3 3700 days ago
During a surgery, the program doesn't have the luxury of showing a screen that says "No telemetry available." Such a program would be considered equally unreliable. Worse, it would lead to confusion: "Why is the telemetry unavailable? What does 'Error Code 2931' mean?"

A spectacular crash immediately led to pinpointing the problem: The antivirus.

If the program's sole purpose is to transform a massive amount of data in real time, it must have disk access by definition. It can't not have disk access. What would you suggest it do?

2 comments

Yes it does! Showing "no telemetry available" is exactly what it should do. Crashing = unreliable. Reporting an error condition = reliable.

Immediately? Took them 5 minutes to reboot the computer. The scan of the folder would take seconds let alone minutes. Pinpointing the problem is secondary. Not killing the patient is primary.

> If the program's sole purpose is to transform a massive amount of data in real time, it must have disk access by definition. It can't not have disk access.

And that is the mind set the programmers of the software had. You have to take care of error conditions. The processing can't have no disk access but no disk access can occur temporarily or permanently. What can you do? Pause the processing part of your program. Or make the processing part treat "no data" as valid input and display something else.

Imagine taking that viewpoint with an ECG machine: This machine displays a heart rate waveform. So it must have a heart rate input. If there is no heart rate we'll just crash requiring a 5 minute reboot.

Hell no! Draw a straight line and set off a buzzer!

I agree with you, but the flat line might not be the best example because that has a very specific meaning (asystole) that doctors will take certain actions based on without necessarily trying to verify it manually when time is already critical. You should never be able to confuse an error message for anything else.
> You should never be able to confuse an error message for anything else.

Exactly. Which is why "Can't read file sensors.dat" is way better than just crashing. Crashing is one of the worst error messages you can get because you don't know what happened.

" Avernar 18 hours ago

Yes it does! Showing "no telemetry available" is exactly what it should do. Crashing = unreliable. Reporting an error condition = reliable.

Immediately? Took them 5 minutes to reboot the computer. The scan of the folder would take seconds let alone minutes. Pinpointing the problem is secondary. Not killing the patient is primary."

Well-put. Far back as Burroughs B5000, the best way to handle erroneous software or I/O was to freeze it, notify the administrator/user of the problem, and give them sensible options for how to proceed. They might restart the I/O, restart the app, modify erroneous data to proceed (rare here), and so on. Crash and reboot is a Windows 95/NT strategy where incompetence dominated. Today's Windows OS and tooling can do much better with little effort by developers.

But the spectacular crash didn't pinpoint the problem. That came afterwards, when the manufacturer was able to look into the crash.

In a situation like this, confusion is all but inevitable. As a developer, the goal should be to minimize that confusion to the greatest extent possible. A blank screen and crash introduces another step to the process as people wonder "what's going on?" instead of "shit, it threw an error." It's probably not a big deal, but with medical devices during surgery, that extra step could be hugely problematic.

I wonder what surgeons think about software engineers? Except for open heart surgery, they don't normally do their fixes by stopping and starting the thing they are repairing...