I don’t have much sympathy for CrowdStrike but deploying slowly seems mutually exclusive to protecting against emerging threats. They have to strike a balance.
In CrowdStrikes case, they could have rolled out to even 1 million endpoints first and done an automated sanity/wellness check before unleashing the content update on everyone.
In the past when I have designed update mechanisms I’ve included basic failsafes such as automated checking for a % failed updates over a sliding 24-hour window and stopping any more if there’s too many failures.
yeah, I don't get the "we couldn't have tested it" crap, because "something happens to the payload after we tested it". Create a fake downstream company and put a bunch of machines in it. That's your final test before releasing to the rest of the world.