Hacker News new | ask | show | jobs
by SamuelAdams 693 days ago
I used to work in healthcare IT. Running a code is not always only CPR.

Different medications may be pushed (injected into the patient) to help stabilize them. These medications are recorded via a bar code and added to the patients chart in Epic. Epic is the source of truth for the current state of the patient. So if that is suddenly unavailable that is a big problem.

2 comments

Makes sense, thank you for the explanation.
Okay,not having historical data avaliable to make decision on what to put into a patient is understandable - but maybe also print critical stuff per patient once a day? - but not being able to log an action in realtime should not be a critical problem.
It is a critical problem if your entire record of life-saving drugs you've given them in the past 24 hours suddenly goes down. You have to start relying on people's memories, and it's made worse by shift turn-overs so the relevant information may not even be reachable once the previous shift has gone home.

There are plenty of drugs that can only be given in certain quantities over a certain period of time, and if you go beyond that, it makes the patient worse not better. Similarly there are plenty of bad drug interactions where whether you take a given course of action now is directly dependent on which drugs that patient has already been given. And of course you need to monitor the patient's progress over time to know if the treatments have been working and how to adjust them, so if you suddenly lose the record of all dosages given and all records of their vital signs, you've lost all the information you need to treat them well. Imagine being dropped off in the middle of nowhere, randomly, without a GPS.

That's why there's a sharpie in the first aid kit. If you're out of stuff to write on you can just write on the patient.

More seriously, we need better purpose build medical computing equipment, that runs on it's own OS, and only has outbound network connectivity for updating other systems.

I also think of things like the old school "check list boards" that used to be literally built into the yolk of the airplane they were made for.

I’m afraid the profitability calculation shifted it in favor of off-the-shelf OS a long time ago. I agree with you, though, that a general purpose OS has way too much crap that isn’t needed in a situation like this.
> That's why there's a sharpie in the first aid kit.

That doesn't help when the system goes down and you lose the record of all medications administered prior to having to switch over to the Sharpie.

> It is a critical problem if your entire record of life-saving drugs you've given them in the past 24 hours suddenly goes down.

Will outages like this motivate a backup paper process? The automated process should save enough information on paper so a switch over to paper process at any time is feasible. Similar to elections.

Maybe if all the profit seeking entities were removed from healthcare that money could instead go to the development of useful offline systems.

Maybe a handheld device for scanning in drugs or entering procedure information that stores the data locally which can then be synced with a larger device with more storage somewhere that is also 100% local and immutable which then can sync to online systems if that is needed.

And with their luck, those handheld devices will also be sent the OTA update that temporarily bricks them along with everything else.
no money for that

there are backup paper processes, but they start fresh when the systems go down

If it was printing paper in case of downtime 24/7, it would be massive wasteage for the 99% of time system is up

A good system is resilient. Paper process could take over when system is down. Form my understanding healthcare systems undergo recurrent outages for various reasons.
Many place did revert back to paper processes. But, it’s a disaster model that has to tested to make sure everyone can still function when your EMR goes down. Situations like this just reinforce that you can’t plan for if IT systems go down, it is when they go down.
My experience with internet outages affecting retail is the ability to rapidly and accurately calculate bill totals and change is not practiced much anymore. Not helped by things like 9.075 % tax rates to be sure.
How about an e-ink display for each patient that gets drug and administration info displayed on it?
Real paper is probably as much about breaking from the "IT culture" as it's about the physical properties. E-ink display would probably help with power outage, but happily display BSOD in an incident like this.
Honestly if you were designing a system to be resilient to events like this one, the focus would be on distributed data and local communication. The exact sort of things that have become basically dirty words in this SaaS future we are in. Every PC in the building, including the ones tethered to equipment, is presently basically a dumb terminal, dependent on cloud servers like Epic, meaning WAN connection is a single point of failure (I assume that a hospital hopefully has a credible backup ISP though?) and same for the Epic servers.

If medical data were synced to the cloud but also stored on the endpoint devices and local servers, you’d have more redundancy. Obviously much more complexity to it but that’s what it would take. Epic as single source of truth means everyone is screwed when it is down. This is the trade off that’s been made.

I don't think it is historical data required to make a decision, it is required to store the action for historical purposes in the future. This is ultimately to bill you and to track that a doctor isn't stealing medication, improperly treating the patient, and to track it for legal purposes.

Some hospitals require you to input this in order to even get physical access to the medications.

Although a crash cart would normally have common things necessary to save someone in an emergency, so I would think that if someone was truly dying they could get them what they needed. But of course there are going to be exceptions and a system being down will only make the process harder.

> maybe also print critical stuff per patient once a day?

Yep, the business continuity boxes are basically minimally connected PDF archives of patient records "printed" multiple times a day.

maybe non-volatile e-paper, which can be updated easily if things are up, and if the system is down it still works as well as the printouts
updatable e-paper is going to be very expensive
Compared to managing thousands of printers? And then the resulting printouts? Buying ink, changing the cartridges?

Technologically it seems doable. Big enough order brings down the costs.

https://soldered.com/product/soldered-inkplate-5-5-2%e2%80%b...

Of course the real backup plan should be designed based on the actual needs, perhaps the whole system needs an "offline mode" switch. I assume they already run things locally, in case the big cable seeker machine arrives in the neighborhood.

A small printer connected to the scanner should do.
in this case, it's the entire operating system going down on all computers, so I don't think the printers are working either
Most printers in these facilities run standalone on an embedded Linux variant.They actually can host whole folders of.data for reproduction "offline". Actually all scan/print/fax multi function machines can generally do that these days. If the IT onsite is good though the usb ports an storage on devices should be locked down.
Looks like a small scanner + printer running a small minimalistic RTOS would be a good solution.
Ok now you have a park of 200 of those devices to handle. And now you move a patient across a service or to another hospital and then....

Reality is complex.

Oh yes. This would be a contingency measure, just to keep the record in a human readable form while requiring little manual labor. Printed codes could be scanned later into Epic and, if you need to transfer the patient, tear the paper and send it with them.
This.

Anyone involved in designing and/or deploying a system where an application outage threatens life safety, should be charged with criminal negligence.

A receipt printer in every patient room seems like a reasonable investment.

This would be challenging. Establishing crowdstrike’s duty to a hospital patient would be challenging if not impossible in some jurisdictions.
It is not necessarily crowdstrike's responsibility, but it should be someone's.

If I go to Home Depot to buy rope for belaying at my rock climbing center and someone falls, breaks the rope and dies, then I am on the hook for manslaughter.

Not the rope manufacturer, who clearly labeled the packaging with "do not use in situations where safety can be endangered". Not the retailer, who left it in the packaging with the warning, and made no claim that it was suitable for a climbing safety line. But me, who used a product in a situation where it was unsuitable.

If I instead go to Sterling Rope and the same thing happens, fault is much more complicated, but if someone there was sufficiently negligent they could be liable for manslaughter.

In practice, to convict of manslaughter, you would need to show an individual was negligant. However, our entire industry is bad at our job, so no individual involved failed to perform their duties to a "reasonable" standard.

Software engineering is going to follow the path that all other disciplines of meatspace engineering did. We are going to kill a lot of people; and every so often, enough people will die that we add some basic rules for safety critical software, until eventually, this type of failure occuring without gross negligence becomes nearly unthinkable.

Its on whoever runs the hospitals computer systems - allowing a ring 0 kernel driver to update ad-hoc from the internet is just sheer negligence.

Then again, the management that put this in are probably also the same idiots that insist on a 7 day lead time CAB process to update a typo on a brochure ware website "because risk".

This patient is dead. They would not have been if the computer system was up. It was down because of CrowdStrike. CrowdStrike had a duty of care to ensure they didn't fuck over their client's systems.

I'm not even beyond two degrees of seperation here. I don't think a court'll have trouble navigating it.

I suppose it will come as a surprise to you that you have misleading intuitions about the duty of care.

Cloudstrike did not even have a duty of care to their customer, let alone their customer’s customer (speaking for my jurisdiction, of course).

If that really were how it worked, I don’t think that software would really exist at all. Open Source would probably be the first to disappear too — who would contribute to, say, Linux, if you could go to jail for a pull request you made because it turns out they were using it in a life or death situation and your code had a bug in it. That checks all the same boxes that your scenario does: someone is dead, they wouldn’t be if you didn’t have a bug in your code.

Now, a tort is less of a stretch than a crime, but thank goodness I’m not a lawyer so I don’t have to figure out what circumstances apply and how much liability the TOS and EULAs are able to wash away.

When I read something like this that has such a confident tone while being incredibly incorrect all I can do is shake my head and try to remember I was young once and thought I knew it all as well.
I don't think you understand the scale of this problem. Computers were not up to print from. Our Epic cluster was down for placing and receiving orders. Our lab was down and unable to process bloodwork - should we bring out the mortar and pestle and start doing medicine the old fashioned way? Should we be charged with "criminal negligence" for not having a jar of leeches on hand for when all else fails?
I was advocating for a paper fall back. That means that WHILE the computers are running, you must create a paper record, eg “medication x administered at time y”, etc., hence the receipt printers, which are cheap and low-dependency.

The grandparent indicated that the problem was that when all tow computers went down, they couldn’t look up what had already been done for the patient. I suggested a simple solution for that - receipt printers.

After the computers fail you tape the receipt to the wall and fall pack to pen and paper until the computers come back up.

I completely understand the scale of the outage today. I am saying that it was a stupid decision and possibly criminally negligent to make a life critical process dependent on the availability of a distributed IT application not specifically designed for life critical availability. I strongly stand by that POV.

> I suggested a simple solution for that - receipt printers.

Just so I understand what you are saying you are proposing that we drown our hospital rooms in paper receipt constantly. In the off chance the computers go down very rarely?

Do you see any possible drawbacks with your proposed solution?

> possibly criminally negligent to make a life critical process dependent on the availability of a distributed IT application

What process is not “life critical” in a hospital? Do you suggest that we don’t use IT at all?

Modern medicine requires computers. You literally cannot provide medical care in a critical care setting with the sophistication and speed required for modern critical care without electronic medical records. Fall back to paper? Ok, but you fall back to 1960s medicine, too.
We need computers. But, how about we fall back to an air-gapped computer with no internet connection and a battery backup?

Why does everything need the internet?

This approach is also what popped in my head. I've seen people use white boards for this already so it must be ok from a hipaa standpoint.
A hospital my wife worked at over a decade ago didn't use EMR's, it was all on paper. Each patient had a binder. Per stay. And for many of them it rolled into another binder. (This was neuro-ICU so generally lengthy patient stays with lots of activity, but not super-unusual or Dr House stuff, every major city in America will have 2-3 different hospitals with that level of care.)

But they switched over to EMR because the advantages of Pyxis[1] in getting the right medications to the right patients at the right time- and documenting all of that- are so large that for patient safety reasons alone it wins out over paper. You can fall back to paper, it's just a giant pain in the ass to do it, and then you have to do the data entry to get it all back into EMR's. Like my wife, who was working last night when everyone else in her department got Crowdstrike'd, she created a document to track what she did so it could be transferred into EMR's once everything comes back up. And the document was over 70 pages long! Just for one employee for one shift.

1: Workflow: Doctor writes prescription in EMR. Pharmacist reviews charts in EMR, approves prescription. Nurse comes to Pyxis cabinet and scans patient barcode. Correct drawer opens in cabinet so the proper medication- and only the proper medication- is immediately available to nurse (technicians restock cabinet when necessary). Nurse takes medication to patient's room, scans patient barcode and medication barcode, administers drug. This system has dramatically lowered the rates of wrong-drug administration, because the computers are watching over things and catch humans getting confused on whether this medication is supposed to go to room 12 or room 21 in hour 11 of their shift. It is a great thing that has made hospitals safer. But it requires a huge amount of computers and networks to support.

This would be a disaster from a HIPAA perspective, and an unimaginable amount of paperwork.
For relying on windows to run this kind of stuff and not doing any kind of staged rollout but just blindly applying untested kernel driver 3rd party patching fleet wide? yeah honestly. We had safer rollouts for cat videos than y'all seem to have for life critical systems. Maybe some criminal liability would make y'all care about reliability a bit more.
Staged rollout in the traditional sense wouldn't have helped here because the skanky kernel driver worked under all test conditions. It just didn't work when ot got fed bad data. This could have been mitigated by staging the data propagation, or by fully testing the driver with bad data (unlikely to ever have been done by any commercial organization). Perhaps some static analysis tool could have found the potential to crash (or the isomorphic "safe language" that doesn't yet exist for NT kernel drivers).
If you don't see that the thing that happened today that blew up the world was the rollout, I don't know what to tell you.
A QR code can store 3 KB of data. Every patient has a small QR Sticker printer on their bed. Whenever EPIC updates, print a new small QR sticker. Patient being moved tear of sticker and stick to their wrist tag.

This much of patients state will be carried on their wrist. Maybe for complex cases you need two stickers. Have to be judicious in encoding data, maybe just last 48 hours.

Handheld qr readers, off line that read and display QR data strings.