|
|
|
|
|
by patio11
5582 days ago
|
|
I feel for you. On the plus side, process improvements to prevent it from happening next time are exactly how you should respond to things like this. One which has saved my bacon numerous times is investing a few hours into tweaking monitoring and alert systems. I hear PagerDuty exists to help with this. I use a bunch of scripts and bubblegum, and even that caught 10 of the last 12 big problems. Queuing systems dying has hosed me many times over the years, for example, and a borked deploy which causes that would have my phone ringing before I got my laptop closed. |
|