|
|
|
|
|
by ravedave5
685 days ago
|
|
The goal for oncall should be to NEVER get called. If someone gets called when they are oncall their #1 task the next day is to make sure that call never happens again. That means either fixing a false alarm or tracking down the root cause of the call. Eventually you get to a state where being called is by far the exception instead of the norm. |
|
We deployed a new system and had one week on call for each of five team members. The first couple of rotations were hell. Almost every night ended up with at least one wake up call. As we learned how to solve each type of outage, we then taught the first-line staff how to reboot the right components so we didn’t get as many wake-ups, while we spent our days fixing the bugs. And eventually the system stopped crashing.
The on-call pay was really good (nearly double for that week) and it was a pretty sweet reward to be able to rake that in as calls stopped coming. We broke out a bottle of champagne when the first week of no calls had passed.
Eventually on-call was cancelled.
Imagine how this story would have ended if management had incentivized us differently, for example if you only got the extra pay for the nights where you got pages.