Interesting, do you have a link to this statement? Also, do they state what did cause the crash? At least removing the file of zeroes does solve the problem, as the instructions both from Microsoft and CrowdStrike states "Boot into safe mode. Delete C-00000291*.sys." That's the file(s) with the zeroes... See https://www.crowdstrike.com/falcon-content-update-remediatio... and https://www.youtube.com/watch?v=Bn5eRUaMZXk (3 minutes 20 seconds in).
AFAIK in one of the older crowdstrike threads, there was a tweet that said the driver checked for a sentinel value of AAAAA... before loading it, so an entirely blank value wouldn't have caused the issue. I can't find the source now, but some comments do seem to corroborate it:
Right, they write rather cryptically "This is not related to null bytes contained within Channel File 291 or any other Channel File."
That's not quite the same as saying "This is not related to Channel File 291 containing all nul bytes."...
I don't have first to hand knowledge here, but rely on Dave Plummer's statement.
Regardless of zeroes or single files or not, the fact is that bad data in C-00000291.sys in combination with bad validition in the driver causes it to crash. Deleting C-00000291.sys causes the driver to stop crashing.
Anyway, my main point isn't really about this. It's about the big bang global roll out simultaneously to at least 8.5 million systems in one go that's irresponsible.
The driver architecture is the lesser evil here, although it's bad enough!
> the fact is that bad data in C-00000291.sys in combination with bad validition in the driver causes it to crash
This is, in fact, not a fact. We really don't know yet.
CrowdStrike blue screened one of my laptops twice right as the incident was getting started, before a fix was available. There was no boot loop in my case. I was back up and in the middle of an episode of Breaking Bad the second time it got me, 30 minutes after the first. Did the agent wait that long to load a content update it had already loaded before? Maybe, but it's at least as likely that the content was loaded the whole time, and that some activity pattern set it off. Thus, I'm skeptical of the problem being simple content validation.
Yes - subsequent to my comment. Thanks.
But how can this latest statement can be true, if the previous statement that the crash was not related to the zero bytes content is true?
Good question. There's some evidence that not all affected systems has seen this 'all zeroes' file, the first account stories varies. But something was definitely broken in the deployed data. But, once again, CrowdStrike does not paint a clear picture and it raises new questions and only partially answers old ones.
Why is it so hard for manufacturers to just go ahead and explain what really went wrong, without a lot of corporate b..t? Probably, if they do really say what happened in so many words they might open themselves for negligence lawsuits. Hopefully somebody files one anyway. The industry needs to learn to be better, and the only thing that talks loudly enough is probably money. Lost revenue, liability damages, and share holder value loss.
Speculation: this "all zero" file is part of a signed batch, they have to have signatures, they are not that dumb (I hope...). By removing a file, the batch becomes incomplete, fails the check, and some corruption recovery mechanism takes over, most likely disabling the update and triggering an update. In the meantime, they fixed the content update, fixing the crash.
https://news.ycombinator.com/item?id=41005546