Hacker News new | ask | show | jobs
by Archelaos 703 days ago
It would mitigate the problem, but not solve it. You can still imagine a condition that only occurs after the update has been rolled out everywhere. Furthermore, such a bug would still be extremely problematic for the concerned customers, even if not all of them were affected. In addition, it would be necessary to react very quickly in the case of zero-day vulnerabilities.
2 comments

Yes, I am not arguing against having the ability to deal with it quickly - I am saying canary/ staging helps you do exactly that. Because as we see in the case of Intel CPUs and Crowdstrike some problems or scale of some problems is best prevented.
(semantic argument warning)

"Mitigation" is dealing with an outage/breakage after it occurs, to reduce the impact or get system healthy again.

You're talking about "prevention" which keeps it from happening at all.

Canarying is generic approach to prevention, and should not be skipped.

Avoiding the risk entirely (eBPF) would also help prevent outage, but I think we're deluding ourselves to say it "solves" the problem once and for all; systems will still go down due to bad deploys.