Not if you are running critical systems and the antivirus is not 100% guarantied to be safe and running in an isolated environment.
Hospitals should not loose their ability to provide care to sick people, just because of an misconfiguration of an antivirus. That is as bad as airplanes crashing because of a lack of redundancy and management of risks.
Not on production critical systems where there are human lives at stake. Last Friday is a pretty good example of what comes together with ungoverned ‘autoupdate’.
So let's imagine that it has to be updated manually. New threat appears and since it takes a while to manually update it means bad actors can act on it meanwhile, causing a similar or even worse disruption since it could have far more severe impact, because of the bad intents.
"Immediate across the fleet" and "Entirely manual process" are not the only two options. HN rules say we must assume good faith, but there are obviously options in between, and all of them stop the issue that happened on Friday.
Your argument is the 0.01% of cases should dictate the other 99.99%s actions?
I would pick automated testing and spread fleet deploys. There's no reason in any enterprise this should take more than 1-2 hours, which is a perfectly acceptable window of risk.
I'm not fully sure what you mean by 0.01% cases? Where did you get those percentages?
Businesses are under a constant barrage of cyber attacks, with goals to steal the data, encrypt it and then blackmail or sell all the data. Ransomware payouts exceeded $1 bil last year. And that doesn't include all the damage done besides the payouts.
Edit: Supposedly global cost of cybercrime is expected to reach $20 trillion+ by 2027.
If you need to have automatic updates then you need to apply risk analyses of what would happen if that system fails.
A typical solution would be to have two machines, one with the automatic updates and a second one without automatic updates that jumps in in case the first one breaks down.
>A typical solution would be to have two machines, one with the automatic updates and a second one without automatic updates that jumps in in case the first one breaks down.
Great, now the other one is still vulnerable and hackers can still steal information from it.
The proper solution is a hardened machine build for critical systems that doesn't have internet access, disabled USB, attachments blocked in email, etc.
However that isn't popular and most orgs would prefer a day of downtime from this type of outage vs the hassle and cost of doing it right.
Realistically what is the alternative if you are running servers that could seriously be the target of an attack?
I will give you that I highly doubt that a large number of these machines are anywhere near that critical nature, but there are some that will fall within that much risk.
What do you do, just not update to handle new risks? A lot of systems going down is really bad, don't get me wrong. But is it worse that you could be breached depending on the data (and other services) those systems may have access too?
To me this is a flaw in Crowdstrike but also Windows that this could happen in the first place, and a serious flaw on Crowdstrike's side that this somehow got out.
And yes I do acknowledge that much of this is security theatre, but I also would not be surprised if it does sometimes work.
To be clear, you blame CrowdStrike, Windows (??) but not the companies who picked this software, configured it and wrote their own internal risk policies around a kernel level piece of software?
Most of the blame here falls on Crowdstrike. Both from a software standpoint that it can cause a BSOD so easily and not be able to handle something like this happening. But also whatever failure happened to let that file get out.
Some, minor, blame falls on Windows due to its ability to BSOD as easily as it does.
As far as the companies, it is a tricky situation. Many of the companies have Crowdstrike enabled and automatic updates turned on to check some audit box. They have to keep the updates going out regularly.
We are well past the point in tech that a company is solely responsible for their systems with external dependencies being the norm. Either with the shared security model with cloud services like AWS or a reliance on external API's and servers. You have to trust the vendor you are working with for whatever critically important system is going to do their job. Could you look back and say that maybe you chose the wrong vendor for a specific piece of software, but this could have happened to other vendors.
Something that I am not entirely sure of is for those audit, compliance, etc requirements can they use an alternative update method. And this is something that would be different based on each compliance, but to the best of my knowledge for security software most want you to have automatic updates.
If this was the case of all of these servers going down because of a major AWS outage would you really be saying the companies are to blame?
> Many of the companies have Crowdstrike enabled and automatic updates turned on to check some audit box. They have to keep the updates going out regularly.
While many companies probably do that, it's usually not required if you can argue for an alternative approach and how it fits your risk appetite better (e.g. progressive updates on a routine schedule).
At some point you have too, you will never control 100% of the system between your servers and whoever or whatever will be interacting with it, and between your servers and whatever other services you have to work with.
There might be smaller parts of your system you could say this, but unless your system is 100% airgapped, and all of the wiring, servers, etc are all put down by you and you are working with a LAN.
There are not many systems that fall within that definition. As soon as you hit using the internet for communication you are reliant on your ISP working. Maybe you can have a redundant connection, but then you have to assume both of those will do their job and that they don't have a dependency that could bring them both down.
So no, it's not absurd unless you are never going to the internet. You have to make the decisions on what your system relies on and what it can handle.
I fully understand what this brought down, but again there are plenty of other instances where you assume an outside company is going to do their job.
Looking back and saying, well maybe this was a bad idea because its an external dependency isn't helpful when we can point to any number of other external dependencies that may not have brought down as many systems but can just as easily bring down critical systems.
It's a trade off. That said, we're in an age where companies do 100+ pushes per day. Automate a build, run a test, then deploy rolling updates across the fleet.
The options aren't "everyone auto updates or no updates for weeks", there's a balance point. It's very clear what choice most critical companies this week did though.
Maybe if my antivirus has basic filtering of input values. But in a critical systems scenario, I want to validate in my testing stage first, or at least run a split rollout so that my entite fleet doesn't shit the bed.
Hospitals should not loose their ability to provide care to sick people, just because of an misconfiguration of an antivirus. That is as bad as airplanes crashing because of a lack of redundancy and management of risks.