|
|
|
|
|
by elmerfud
969 days ago
|
|
Maybe the better question is why is this a problem? I've been in oncall rotations my entire multi decade career and in the beginning I was frustrated that I was getting called all the time. Now I understand that the reason for that was bad management/owners. I'm still oncall but if there's a issue that triggers a review to make changes so it never happens again. N+1 on everything, and N+2 redundancy on critical systems. A single failure should be a non-issue. It should take multiple failures inside of a single system to engage the oncall. Human error should be eliminated by not allowing changes during big holidays. Many companies lock down from Thanksgiving to new years. While things can and will still happen, properly implemented and managed systems should eliminate the most if the oncall issues. The ones that do come up will be an "all hands on deck" kind of thing. People will be less afraid of oncall then they know they rarely get called. |
|