Hacker News new | ask | show | jobs
by matsur 1463 days ago
We use a phased rollout process for all routine changes (like this one). Once a change has passed peer review and the "dry-run", changes are rolled out to progressively larger slices of our production environment, with monitoring systems and engineers watching for adverse effects.

The specific network locations that were impacted by this change were amongst the last to see the change rolled out. One deficiency in our deployment strategy (which we will correct) is that no network locations in the affected "MCP" configuration received the change early in our rollout process. If that had been the case, we would have found the problem much earlier and the incident's impact would have been much reduced.