Hacker News new | ask | show | jobs
by sathackr 3700 days ago
Exactly this. As I was reading the article I hoped to find this exact point in the HN comments.

The fault lies in the bad software. It could have been the indexing service, online defrag, automatic updates, or any of the other various background processes windows runs.

If it is critical software, it should be designed in a way to not fail when something non-critical malfunctions, and even the critical pieces should be built with redundancy.

2 comments

I work for a medical devices company and I just want to say: We, specifically a few of us on the engineering staff, bring this sort of shit up constantly. I go hoarse having the same conversations over and over and over again about robustness in the face of failure, resiliency, redundancy, etc... The truth is that we're beholden to a board and an executive management team that, quite simply, doesn't give a fuck about our problems.

I'm not trying to excuse the company in the article or the company that I work for. And I do not work for the company in the article. I just wanted to point out that I do see how this can happen very easily and repeatedly.

I'm just curious. I work in the automotive sector and develop hardware and software using components that are advertised as functionally safe. I use harden RTOS from vendors who claim their RTOSes are in medical devices as well as military systems.

One such system is Disti (http://www.disti.com/)

In the automotive field, our software is MISRA compliant, static analysis is done (Klockworks - http://www.klocwork.com/) and we follow a very strict set of guidelines outlined in ISO25119 and ISO26262 for the construction and agricultural markets. Think self driving tractors and combines. For example: A tractor traveling down a field with a combine following it a few rows over separating and chopping things into a catcher all done with one person driving.

This shit can't happen where I work. Every component on our circuit boards has a MTTFd of 40 years. Hardware watchdogs can kill the system if software goes awry.

Software is written to readiness level called SRL-1, SRL-2, etc... Unit tests, peer reviews, etc... Functional safety in medical devices is covered under 510(K) (http://www.fda.gov/MedicalDevices/ProductsandMedicalProcedur...)

I find it amazingly short sited that antivirus software is even allowed on a medical device to begin with. I can't even imagine how this system even passed the easiest audit for software readiness.

How is it you "go hoarse having the same conversations?" Do you not have to meet FDA compliance criteria? Are you in the US?

I didn't read the whole article so I'm assuming this happened in the US. For me, we sell autonomous vehicles in the European markets where functional safety seems to be a bit more aggressive there right now for vehicles. Not sure about medical devices.

> I find it amazingly short sited that antivirus software is even allowed on a medical device to begin with.

Well... Then you should consider yourself blessed to have never had to deal with the bureaucracy of a hospital IT department and administrative staff.

Who owns the medical device? Who paid for it? If it's a glorified Windows machine and it's attaching itself to a hospitals WiFi network... Who has to use this machine? Physicians, surgeons, anesthesiologists, radiologists, other specialists, nurses, staff? All of them need to be trained on it's usage, no doubt. They don't get that training in schooling. Who provides it? This and a million other things stack up. So, well, I mean it can start to make sense how these things end up with random AV software installed on them, right?

> How is it you "go hoarse having the same conversations?" Do you not have to meet FDA compliance criteria? Are you in the US?

Yes, we are. Yes, we do "have to meet FDA compliance." I can't define "have to meet" and I work here. Of course, I'm just an engineer. We have legal, executive, and other staff for those matters. I'm sorry, I'm not trying to be an asshole... I'm just trying to be honest about where I find myself in this situation.

Sounds like an awesome job with good engineers but neglectful and irresponsible management.

If you are only making these warnings verbally, you might want to consider emailing your immediate manager with a list of concerns. Make it as neutral as possible and ask for guidance on how they want to address the issues. But if it's on the mail server, it will be good for discovery if the worst happens, and frankly given lives are at stake you probably need to show, in writing, that you were attempting to have the issues addressed.

Who knows? That might actually get traction. Might even save someone's life!

> Yes, we do "have to meet FDA compliance." I can't define "have to meet" and I work here. Of course, I'm just an engineer.

You are not an engineer. This is a protected term in the US and other countries. If you were a professional engineer, you would be bound by a legal and moral framework preventing you from doing work on unsafe medical equipment.

There is a good argument that there should be a software equivalent of protected engineer status for this kind of work. This kind of story should be a wake up call. I personally had no idea that critical medical equipment would be running on MS windows...

Engineer alone is not a protected term in the US. "Professional Engineer" is.

As of 2012 you can take the PE Exam for Software Engineering [1].

[1]: http://ncees.org/about-ncees/news/ncees-introduces-pe-exam-f...

Ahh, guess my info is out of date, thanks.
> How is it you "go hoarse having the same conversations?" Do you not have to meet FDA compliance criteria? Are you in the US?

You have to deal with FDA pretty much regardless where you're based, if you want any kind of market for your medical device. A lot of countries define compliance as whatever is good enough for FDA.

FDA rules for software... what FDA wants is a paper trail.

I'm with you...I can see exactly how this can happen.

Unfortunately the only thing that can solve the apathetic board and executive management problem(who only see dollar signs) is the actuality, or realistic possibility, of significant financial loss, or loss of their personal freedom(prison) due to the negligence of the system. And a $10 Mil fine for a fault in something that you make $100 Mil off of is not significant. That's $90 Mil profit in their eyes. And they probably get to write it off.

Even more unfortunate, is that, in the situation that this happens, the "engineers responsible" will be fired, and the executives will resign with a nice golden parachute, and go on to do the same thing somewhere else.

But then you have the company that does do it right, spend the time, and the money to make a truly redundant, fault-tolerant system. But, they come in at a price point 20% higher than their competitor, who doesn't. Which company survives and which doesn't?

Sad, but, unfortunately the way it is. I don't know a practical solution either.

I've thought about this a lot. I've had private conversations with the CEO which lead me to believe that their apathy is a, if not the, primary driver in this situation, at least within the company. Ultimately, they are the single individual who can force these changes in the departments. As things stand today, as far as I can tell, the CEO and the rest of the executive team got theirs and that's that. Anything extra is just that, extra.

We've been close to undergoing "major" scrutiny (as it was sold to me, it was A Big Deal) from the FDA before. I, personally, just a lowly and underpaid engineer, have saved executive staff from having to sign their names on that noose. I had a manager once who seemed to want to push it that far, to stand idle-by while the walls fell down around us. I, unknowingly at the time, prevented it from happening because I was trying to help our customers. I don't regret that decision, actual patients shouldn't have to suffer because of a management teams ineptitude. I do think about it often, though. I understand this is nebulous, and I'm sorry for that. This is a reality, though.

I guess that's the thing that really gets me, the FDA. We sell FDA approved devices. Where the fuck is the FDA? We send them paperwork and they are happy. I can only form the opinion that they, the FDA, are ill prepared to handle this situation; The actual situation, the "the medical devices industry is a fucking train wreck waiting to happen" situation, and especially so they are ill prepared to handle it at scale. Audits are cursory and almost as a rule non-technical. I suppose it'll take a Toyota-level incident to bring change about.

Along the same lines as your 'where the fuck is the FDA' comment -- I've worked in Financial and Healthcare systems on and off for about the last 10 years.

I have seen SSAE16 audited companies that haven't patched anything in years. FDIC examined institutions with ATM machines still running OS/2 Warp(actually probably more secure than the ones running XP, with no updates installed. Ever.)

I once found the management interface of a SAN with a public IP address directly on the device, no firewall rules of any sort, and the device still had the default username/password. It hadn't been patched or rebooted in over 2 years.

More shocking is that a review of the logs didn't show any successful unauthorized logins. Of course, they could have cleaned up after themselves, but further investigation was outside the scope of my engagement(They didn't want to know. They were happy to present that, despite the oversight, there was no indication that PHI had been accessed by unauthorized people. Their conclusion, not mine.)

I can't help responding again. If you have tangible evidence of neglect or regulatory non-compliance, or even risks that are known about but not being dealt with by management - have you considered compiling this material and and reporting it to the FDA?

But as I've said before - I really hope you have written down your concerns to someone in management. If it gets to the point where negligence takes out the company, there's going to be an attempt to make someone a scapegoat. Depending on your role in the company you don't want to be held personally liable for the incompetence and ruthlessness of management...

>Where the fuck is the FDA? We send them paperwork and they are happy.

When regulation becomes more about permission than proficiency, you'll get corruption instead of competence.

> Unfortunately the only thing that can solve the apathetic board and executive management problem(who only see dollar signs) is the actuality, or realistic possibility, of significant financial loss, or loss of their personal freedom(prison) due to the negligence of the system.

Or developers refuse to build software without safety built in.

If they can't hire anyone to build their unsafe systems, they'll have to start building safe software.

Let the market work for you.

That sounds nice...but then you will be replaced by a developer that will toe the company line. You're making 'unreasonable' demands and holding up progress. 'We can fix that with version 2.0'

If every developer on the planet suddenly had a pang of consciousness, then something like this would work.

Fortunately I have never found myself in such a position, but I have seen it many many times.

That's why we should probably require engineering certifications for working on safety-critical software. Working on such software should require demonstrating a certain level of knowledge and upholding a code of ethics.

I generally oppose certification for engineers, but solving collective action dilemmas like this and saving lives in the process is exactly where it would help.

How do you ensure someone upholds a code of ethics? Licensing is not the answer. I'm sure there are many PEs that find themselves in similar situations.

I know examples of people in licensed fields who have sworn to uphold a code of ethics, but have been caught up in very similar situations.

I can't find it now but I just saw a video recently of a rail bridge with a crumbling foundation that had just been signed off on by a PE and declared safe by the railroad.

> get to write it off

A fine being tax deductible does not mean zero cost to the company, it means the profit is reduced before taxes are computed, i.e. the actual cost is reduced by the marginal tax rate. A tax credit means zero cost.

It's not a decision which should be made at the level of executives though.

Presumably developers are the one's estimating how long things take. (If they're not, you have even bigger problems and I'm sorry.) The time to make it safe should automatically be included in those estimates.

Moreover, making it safe shouldn't be a separate part of the process. It should just be part of how you write software. It's either safe or it doesn't exist at all. (Compare this to how organizations like Google deal with concurrency: it's built in from the start.)

A reputable engineer wouldn't design and build a bridge which might collapse. A developer shouldn't build software which puts lives at risk, regardless of management pressure.

If they refuse to relent, there are plenty of jobs where safety isn't critical.

> Presumably developers are the one's estimating how long things take.

This is not meant as a slight: I think you're grossly unfamiliar with software development outside of engineering-driven companies.

It's pretty much a guarantee that product managers are deciding these estimates. They might confirm with the developers, but the conversation probably went something like this:

"Does 3 weeks sound about right for this?"

"No, we'll need 6"

"Why?"

"Safety checks"

"Ok, we don't have 6 weeks. I can give you 4, but we're just gonna have to make do."

Is it scary that conversation happened about a piece of medical software? Absolutely. Would I bet $1k that it happens frequently? Absolutely.

> A reputable engineer wouldn't design and build a bridge which might collapse

Rarely does a single engineer design a bridge nowadays, so corporate liability and reputation (good luck landing more contracts if your bridge collapses) is a huge factor in much of that beyond simple ethics.

I would be shocked if anything happened to Merge as a result of this, whereas a company who designed a faulty bridge would be sued into oblivion.

Further, professional engineering in the US is a whole different game that involves licensing and regulations specifically to avoid that situation. Software "engineering" has no such equivalent currently.

Pinning the blame on the peons is a sure-fire way to make sure this situation never changes.

Oh, I'm well aware of the difficulty of negotiating with product managers over timelines.

The difference is that they never should get the decision to cut safety checks. Cutting safety checks should be as ludicrous/impossible as writing half the code of each function to cut time.

The conversation should go like this:

PM: "Does 3 weeks sound about right for this?"

Dev: "No, we'll need 6"

PM: "Why?"

Dev: "That's how long it takes to build those 6 features."

PM: "Ok, we don't have 6 weeks. I can give you 4, but we're just gonna have to make do."

Dev: "Okay, which features would you like to cut?"

> Further, professional engineering in the US is a whole different game that involves licensing and regulations specifically to avoid that situation.

I'm aware. While I don't think the majority of software developers should be certified, we should require licensing for working on safety-critical applications.

> The conversation should go like this:

I think you're missing the end to that conversation:

>PM: "Ok, we don't have 6 weeks. I can give you 4, but we're just gonna have to make do."

> Dev: "Okay, which features would you like to cut?"

PM: We can't cut any of them. We need features A,B,C in the product and we need it in 4 weeks.

Here we insert a rant from the PM about one of the following:

1) Leadership

2) Hard work

3) Threats about job security

4) Recalling that one time you delivered something ahead of schedule so why is this different

5) I see you getting up to get coffee at least twice a day so stop goofing off and get it done

I think you're vastly overestimating how much power/control said Dev has over the whole process at these sorts of companies.

Sure, they can quit, but if they felt empowered to quit they probably wouldn't be there in the first place: I don't think anyone's busting down the door to work at MedicalBusinessTM.

> we should require licensing for working on safety-critical applications.

Fully agreed, though with some misgivings.

Incorporating safety-critical software into the "professional engineering" spectrum would almost certainly require some things that are seen as near-heresy to the software community, like requiring a 4-year degree from an ABET-accredited program.

Still, I agree.

I've managed to push back on PMs many times by redirecting them to trade between time and features (so they still feel like they're in control). That being said, you're right that I would never work somewhere that treats developers so porly.

> Incorporating safety-critical software into the "professional engineering" spectrum would almost certainly require some things that are seen as near-heresy to the software community, like requiring a 4-year degree from an ABET-accredited program.

The vast majority of software isn't safety-critical, so there would still be plenty of opportunities for developers who don't fit into rigid modes.

I 100% oppose having accreditations for all developers.

I agree with your sentiment 100%, but if you insist on doing things safely while your colleagues do not, you might get a reputation for being slow and be earmarked for replacement. Perhaps it's worth losing a job over, but your replacement will cut corners so the net effect is patients unsafe + you have no job. It feels reminiscent of the prisoner's dilemma.
Accreditations and professional standards are literally textbook solutions for solving prisoner's dilemmas.
This. I'm in the same situation.
There are lots of faults. The software failed. The process that directed a helpdesk tech to install AV was a failure of some manager. The decision to engineer systems and networks in a way such that AV seemed like a good idea was a failure of an architect.
In my opinion, the software failed because the entire system (software, hardware, and humanware) failed to implement, holistically, a safety critical system. You simply cannot ignore the system as a whole. I'd wager we are in violent agreement. :)
>The decision to engineer systems and networks in a way such that AV seemed like a good idea was a failure of an architect.

As ever, a relevant xkcd: https://xkcd.com/463/