Hacker News new | ask | show | jobs
by ggambetta 3034 days ago
> Hope is a strategy [...] At some point we have to hope and assume. for example eventually we hope the compiler authors did a good job with the next version we are about to use, or we assume that the kernel fix was good.

Or you can do a very slow, controlled rollout of the new version and see what happens. With all the systems I worked with while at Google, both as a SRE and a SWE, whenever we had a new version to release, we'd update one task in one cluster, let it run for a day or two, check the logs and the metrics... if it was OK, update one job in one cluster, then repeat the process with an entire cluster (the designated canary cluster), and only then release to the rest of the clusters. If any of these went wrong, we'd rollback or patch, depending on the severity. We rarely missed our weekly releases.

I'm sure the same applies to a new compiler version or a new kernel fix. You don't need to assume anything.

1 comments

hmm, yes, this was always going to be a contentious point, and highly subjective. I think the main issue here is it was in a large part also a very human impact, which is much harder to measure. I think it was fair to say that our build system at the time was more rigid and testing fully both flows not within our grasp. I discussed this point about 'hope' with a colleague, I think I agree with his conclusion: "Google actually designs some of the chips it uses! And runs its own power stations - I think if anyone could legitimately say "Hope is not a Strategy" it might be them."