|
|
|
|
|
by singron
1211 days ago
|
|
Most of it is cellular or regional, but there are a few critical global services. The global network load balancing, network qos, and ddos prevention are more functional because they are global (i.e. you couldn't replace them with equivalent regional versions), but are often causes of issues like this. There was a push a few years ago to ensure global services had at least 99.999% uptime or make them regional. This was a 48 minute outage, so it blows that five 9 budget for 9 years. Ex-googler, no particular knowledge of this event, information might be out of date. |
|
1. Some networking-related service has global, non-standard (compared to the rest of the company) configuration
2. The relevant VP is aware and has decided not to change it because that change is quoted as impossible
3. Some change elsewhere happens that assumes standard configuration
4. The networking service breaks and causes a global outage
5. VP is told to fix it
6. Fix rolls out in weeks, because it wasn't as hard as they said before