Hacker News new | ask | show | jobs
by anon743448 957 days ago
I worked at one place where we had to deal with a lot of off-hours on call issues. Our manager focused on quick resolution and never let it escalate past her. Most of the time she wanted us to simply reboot service or server. Never had time to find root cause.

It happened a lot affecting our personal lives significantly, waking up in the middle of night just to reboot/restart some service.

Over time, the whole team started ignoring on-call alerts, and worked real slow. Pretending that our internet was down or laptop is not booting up. Longer it took to resolve alerts, our automated systems would start sending alerts to higher up in the chain. Also it started to impact our SLA and other metrics.

Finally, they decided to allocate resources to fix and stabilize the system.