|
|
|
|
|
by systemvoltage
1459 days ago
|
|
It would be fun to be a fly on the wall when shit hits the fan in general. From Nuclear meltdowns to 9/11 ATC recordings, it is fascinating to see how emergencies play out and what kind of things go on with boots-on-ground, all-hands-on-deck situations. Like, does Cloudflare have an emergency procedure for escalation? What does that look like? How does the CTO get woken up in the middle of the night? How to get in touch with critical and most important engineers? Who noticed Cloudflare down first? How do quick decisions get made and decided? Do people get on a giant zoom call? Or emails going around? What if they can't get hold of the most important people that can flip switches? Do they have a control room like the movies? CTO looking over the shoulder calling "Affirmative, apply the fix." followed by a progress bar painfully moving towards completion. |
|
https://sre.google/resources/book-update/managing-incidents/ is Google focused, but our flavor of incident response is not too far off.