| The global outage drinking game: bad DNS config push - 1 shot routing loop - 2 shots third party advertising your routes - 3 shots power outage at data centre it turns out everything depends on despite decades and millions in engineering to avoid precisely that - 4 shots Wolves ate through fiber - 5 shots And it was a full moon - 6 shots Single service failure, but service has not been restarted in 5 years, and no longer restarts in any documented fashion - 7 shots And service developers left the company to found a startup - 8 shots Expired internal SSL certificate - 9 shots Daylight savings changeover-induced database corruption - 10 shots Windows Update - 11 shots |
My favorite can’t-restart story is “the shared password manager is down, we need a hardware crypto key to restart it, the key is in the safe, and the safe’s combination was in the shared password manager.”