| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Avernar 3700 days ago

> In that scenario, your surgeon would see the program suddenly freeze.

Only if the programmer or his management were incompetent. The display routine should be running on a separate thread than the processing code. No whole program freeze should occur.

As for displaying random data, why would the programmer want to do this? Either display nothing or the last readings WITH a message that it's not real time.

It's not the same as a crash! A crash requires 5 minutes minimum guaranteed. Restarting instantly after telemetry returns can happen under a second in the best case which can be the difference between a live and dead patient.

> If your environment fails, there's nothing you can do to recover. Planes aren't designed to survive the loss of a wing. Why is this case any different?

There are different kinds of failure. Permanent and transient. Following the permanent procedure for a transient case can be fatal.

Take your airplane example. Loss of a wing is permanent. That would be like the CPU failing or an external cable being cut.

But your engines shutting down can be permanent or transient. Just like disk I/O failing. You'd use the transient procedure in this case. Keep trying to restart the engines. If they restart, great! You've just saved the plane.

Same with the disk I/O. The programmer should keep trying to restart the I/O. If it comes back, great! You've just saved the patient.

2 comments

wpietri 3700 days ago

Definitely. Each component should do its best to keep on keeping on. The display program should keep displaying something, even it's just the most recent data with a big "connection lost" warning. The device should ring-buffer the data and upon reconnection the screen should show as much as possible. The OS should have a strong opinion that the surgery app is very important, and that should the app fail, it should be restarted instantly.

Moreover, this is the kind of thing that should come up in robustness testing. Things should get bumped and wiggled. They should get unplugged and turned off. If the software is really going to run on random Windows boxes, then it should be tested on random Windows boxes. (At which point somebody will hopefully say, "Wow, this sucks, let's make it an appliance.")

No matter what happens, it shouldn't result in a "mysterious crash right in the middle of a heart procedure when the screen went black and doctors had to reboot their computer".

link

sillysaurus3 3699 days ago

I had to step away from this conversation because of how aggressive you were being. Now that no one is watching, we might try to have a productive conversation.

Please consider dropping the adversarial attitude. This place isn't like other sites. The way people converse is equally important to what they say. It's better to transcend than to dominate.

For example, we do not slip in underhanded comments like this:

> In that scenario, your surgeon would see the program suddenly freeze.

Only if the programmer or his management were incompetent

This is just short of a personal attack, which is against the rules. I know you probably didn't mean it that way, but look at how you're framing the debate. I felt as if I'd been teleported onto Fox News and forced to defend myself from an aggressive interviewer's mischaracterizations.

Now, you can take the stance that "It's not against the rules, so I can say whatever I want." That's true, you can. But we're worse off for it. We optimize for good conversation here.

The point I'm trying to get across is that if you really throw yourself into this community, wholeheartedly and without a feeling of having to prove something wherever you go, then this place has a lot to offer. You'll meet a lot of interesting people, you'll hear a lot of interesting stories, and perhaps you'll have an opportunity to contribute to something quite unexpected. But none of that will happen if you try to skewer your opponents wherever you go -- or if you see people here as opponents. We're people.

It doesn't matter what the conversation is. It doesn't matter whether it's about life-or-death, or that this one happened to be about a surgery. The goal is to put yourself in the other person's shoes and to ask yourself, "If I were them, why would I say that?"

Regarding our conversation, if you want to continue it, I'd be happy to. But unless you're trying to learn as much from me as I'm trying to learn from you, it's not going to go anywhere productive. And what would be the point? No one's looking anymore -- it's fallen off the front page, so it's just you and me here. But why should our conversation be so different just because nobody is watching?

There are things to be said, but I have no time to defend myself. You can characterize what I was saying however you want. Or, alternatively, you could ask me what I meant.

I won't pull one of those "I've been in the field for a pretty long time, so I bet you'll learn something..." routines. Those are tired refrains, usually coming from people who have long forgotten what it's like to be young and hungry. But I'm still pretty young, and money's low enough that I'm pretty hungry. Being unable to afford meat is unfortunate, but it's worth not having a job for a little while to throw myself into my research. See why there's no time to defend against aggression?

I think I wrote this because in many ways, you remind me of how I used to be. And if I could go back in time, I'd ask myself what I was doing and why. This type of discourse is an intellectual dead-end. No one is going to learn a thing from watching people try to tear each other apart. Maybe you didn't realize that's what you were doing. It's very easy to slip into that mindset without realizing it.

As for displaying random data, why would the programmer want to do this?

GPUs are bastards. They ignore what programmers want, almost by definition. And as someone who has spent way-too-many years wringing as much performance as possible from them, I assure you that this is a realistic characterization of a possible outcome.

Perhaps that piques your curiosity. If so, then that sounds like the start of a good conversation, no?

link

wpietri 3698 days ago

You shouldn't complain about aggressiveness when you start with it. You opened with an extreme position and denied all possibility of nuance. When multiple people suggested other possibilities, you made sweeping denials based on your imagination of how the program worked.

If you want nuanced dialog, start with nuance and make room for other people's opinions.

link

sillysaurus3 3698 days ago

If that's true, why not quote me? Point out some of these extreme positions. Point out the aggression. You say I started with it. Are you so sure?

Things I did not say:

- It's reasonable for a program fault to reboot a computer.

- It's reasonable not to check error conditions.

- It's reasonable for a program to halt and catch fire.

I wish I were making up that last one, but I'm not. Here are some unreasonable quotes:

Think of your torrent software. If you crank your firewall to block it while it's running it will not crash. If your disk fills up it won't crash.

No one was saying it was okay for the program to crash.

'Halt and catch fire' is not generally considered a proportionate response

No one was advocating this.

I've no idea where you're getting this 'unplugged the disk' thing from, AV software does not work this way.

The AV software denied all disk I/O. If you have nowhere to put data, and no access to data, then you don't have a disk. You have a paperweight.

As for displaying random data, why would the programmer want to do this?

Obviously, the programmer did not "want" to do this. This is what happens when you try to do GPU programming and the I/O is suddenly cut. I've seen it, which is why I said it.

But your engines shutting down can be permanent or transient. Just like disk I/O failing.

The disk I/O didn't fail. It was completely cut off by the AV program. There was no chance of it resuming until the scan completed, which could take far longer than the 5 minutes required to reboot the computer.

Speaking of which, here's where that stupid "the program rebooted the computer" myth stemmed from:

According to one such report filed by Merge Healthcare in February, Merge Hemo suffered a mysterious crash right in the middle of a heart procedure when the screen went black and doctors had to reboot their computer.

To me, it sounds like they restarted the computer to get the AV program to stop. The program did not "crash so hard it rebooted the computer."

Here's how the program works:

Merge Hemo consists of two main modules. The main component is the actual medical device, connected to the catheters, through which data acquisition takes place. This component is connected to a local PC or tablets via a serial port.

The second component is a software package that runs on the doctor's computer or tablet and takes recorded data and logs it or displays it on the screen via simple-to-read charts.

So we see that the company does not have control over their environment. They have no say over what the doctor's computers are like. They have to live with the fact that the doctors' computers are running Windows, and that they run AV scans. It's not up to them.

This is important, because if the company had independently decided it was reasonable to deploy their software with an AV package, then the fault would lay with the company. But they didn't. Now, what can the company do?

Your point was that the software should behave gracefully in this environment. I agree; that was my point too.

The various people in this thread took what I said and morphed it into something so far from reality that I'm frankly a little worried that people are believing it. If I try to get a job, people might read this and conclude that I'm somehow advocating for 300-second crashes. Seriously?

My sole, singular point was this: Small programs are reliable programs. You can't have bugs in what you don't write.

That means a lot of things. But it does not mean "do not handle error conditions." I didn't even say that this program should exit. I said that the spectacular crash led to pinpointing the AV scan as the source of the issue.

I was called incompetent (indirectly), that my position was "extreme," and that I "denied all possibility of nuance." Ok. Sure.

I've re-read the entire article and this entire thread to double check myself and make sure that my assumptions are correct here, so if you see a mistake, please call it out with a quote.

I agree that I'm now being a little shall-we-say heated, and it's annoying that I'm now doing that because of how much I was provoked here. Actually, this is more amusing than annoying. If the whole world is claiming you came across poorly, then you came across poorly, regardless of what you think. I'm wondering where it all went wrong. So please, tell me: What aggression do you feel I started with? I'm genuinely hoping to learn here.

Isn't this all a little tedious? Why are we even doing this? Aren't there more interesting thoughts to think than litigating what someone did or didn't say? I don't know why this happened, and I don't know specifically what you want. But I'm open to suggestions.

link

sillysaurus3 3698 days ago

I apologize. I thought I was better than ranting, but apparently not. That wasn't cool.

Thank you for the advice. I appreciate it. Sorry for the sour grapes.

link

Avernar 3696 days ago

Seems that a good rant was what you needed. :D

As I wrote in my other two posts tonight (more like morning, sigh) is that I tend to tune out emotional and aggressive writing styles. That's probably why my writing style tends to look aggressive. It's just the type of debates I tend to end up in (sigh, again).

So I apologize again if that got you upset at me.

link

wpietri 3698 days ago

Whoa. I was not expecting that. Welcome! May all of Hacker News learn from your grace.

link

Avernar 3696 days ago

> The AV software denied all disk I/O. If you have nowhere to put data, and no access to data, then you don't have a disk. You have a paperweight.

AV software does not deny all disk IO. It just denies write access to a file very briefly and then goes on to the next file as the article stated it was a scheduled scan.

So you do have a disk but a few files temporarily can't be written to (still can be read). The program will get an error code from the write function and can just try to write again.

> The disk I/O didn't fail. It was completely cut off by the AV program. There was no chance of it resuming until the scan completed, which could take far longer than the 5 minutes required to reboot the computer.

This is where you are incorrect. And AV scan does not lock the entire disk for the duration of the scan. It locks and releases each file as it scans them. Fire up process monitor from sysinternals and look for yourself.

Tried it with my AV scanner with a manual scan. Looks like mine doesn't even do a lock on most of the files when doing a manual scan. So at most the file just couldn't be deleted.

> My sole, singular point was this: Small programs are reliable programs. You can't have bugs in what you don't write.

I pretty much agree with you on that one. Smaller programs are more reliable than larger programs.

> What aggression do you feel I started with? I'm genuinely hoping to learn here.

I found it humorous that you saw aggression in my words and that wpietri saw aggression in your words. Me, I just learned to tune out that sort of thing in other peoples posts.

> I said that the spectacular crash led to pinpointing the AV scan as the source of the issue.

And that's what I was arguing about in my original post. You know what Root Cause Analysis is? If not read up about it here: https://en.wikipedia.org/wiki/Root_cause_analysis

My argument was that while the crash identified the AV Scan as a causal factor, it wasn't the root cause. From the wikipedia article: "Though removing a causal factor can benefit an outcome, it does not prevent its recurrence within certainty."

The root cause was that the programmer didn't handle the error code that his file was locked. There are many more causal factors that can trigger the exact same outcome: indexing service, backup program, shadow copy, etc.

Unrelated to our debate, the medical company only blamed the AV software and the IT Staff. Not one mention that their program had a bug.

The fact that their release notes warned against AV software means that they knew their program was deficient. That's what really pisses me off.

link

Avernar 3696 days ago

> Please consider dropping the adversarial attitude.

A debate is by definition adversarial. I do tend to be more passionate when debating certain topics. If I've come across to you as aggressive I apologize. It's just my style of writing and you can freely ignore any aggression you see in it.

> > Only if the programmer or his management were incompetent

> This is just short of a personal attack, which is against the rules.

That comment wasn't about you so it can't be a personal attack against you. It was about a fictitious programmer and his fictitious management used in our examples.

Attacking what you've written is not a personal attack against you. I will rip your words apart, try to prove they are wrong, show where you've either made a faulty assumption or an error in logic. That's what a debate is.

I will never, ever under any circumstances attack you. If you can see that distinction I will gladly continue to debate with you.

This whole thing basically ballooned from this statement of yours:

> The antivirus basically unplugged the disk. What can it do to recover? There's nothing to be done.

Those are the words I'm challenging. You have two points there. The first is that the antivirus unplugged the disk. While I know you're not being literal you're not being accurate either. It locked one or more files.

The second was that there was nothing the program could have done. To this I gave an example of a program that does handle this exact situation and more.

From your other comment replying to someone else but still quoting me:

> > Think of your torrent software. If you crank your firewall to block it while it's running it will not crash. If your disk fills up it won't crash.

> No one was saying it was okay for the program to crash.

This is the example program I'm talking about. I wasn't implying that you think it was okay for the program to crash. I was giving you an example of a program that can handle disk and network error conditions without the need to restart itself (automatically or manually) nor crash itself or the system.

> GPUs are bastards. They ignore what programmers want, almost by definition. And as someone who has spent way-too-many years wringing as much performance as possible from them, I assure you that this is a realistic characterization of a possible outcome.

I believe you in in that situation. GPUs have been designed with speed in mind and that makes for a very complex interface to them. But that reinforces a few arguments made by others in regards to this story, that being should medical equipment be using hardware not specifically designed for the purpose.

But going back to my comment of:

> > As for displaying random data, why would the programmer want to do this?

Based on your GPU comment above we may using different definitions of random data. I took your "random data" as "looks right on the screen but the numbers are wrong". If the programmer knows that his data source is temporarily unavailable, showing stale or corrupted data is the last thing he should do.

link