| Not sure if you have ever been "on-call" If you run a well org... "calls" for ;on-call ppl will be few - but the stress of being mentally "available 24/7" on the on-call position is greater than the stress of tactically dealing with the random event... Thus, lets assume its your on-call week - and you are supposed to take yur SO for Dinner for [EVENT] (anivv, date-nite, familial thing) - etc... Do you know how much emotional/mental stress that puts on the employee? You want your top ops guy avail when you need him in a pinch and he is NOT the guy on call, but the SME who can only solve this X? Yeah - you best treat them well, to ensure not noly THEY BUT THEIR ENTIRE FAMILIES RECOGNIZE THEIR VALUE. How many douche-bag managers ONLY think about their emplyees contribution as pposed to the actual contribution their family sacrifices to your fucking company? Their kids? THier wives/husbands/relationships? GO FUCK YOURSELF IF YOU THINK IN ANY TERMS OTHER THAN ***HUMAN*** {I AM TALKING TO IT/OPs ON-CALL CULTURE IN GENERAL, NOT YOU IN SPECIFIC. IF You are an ops/SRE/DevOps/IT manager - heed my comment.. This is the reason every employee I have had wants to work with me again. Family first. And if you live alone, Family First (you are your family. Take care of yourself) |
Team wanted to do root cause analysis and fix problems for good but there was always some new very important feature to build. So we never fixed these issues.
The way our on-call escalation worked was something like this: First, on-call gets the call. If the issue is not resolved in 15 minutes, then it is escalated to teamlead. After another 15 or 30 minutes, it is escalated to manager and then entire team. After an hour it is escalated to manager's manager. Other teams/SMEs had to join on-call bridge. Then director, VPs etc. And supposedly if it takes long enough it would escalate all the way to CEO.
I used to feel really guilty if it escalated past my teamlead. And if my manager joined the bridge, she would really scare us about further escalation and make us take all kind of shortcuts like hard reboot servers. She never cared for us to find root cause.
But every once in a while issues will get escalated past her to her boss. And then those issues would become top priority to fix for good.
Soon entire team/department learned this. So we stopped fixing issues as soon as possible. We would pretend like server is stuck, internet connection issues, etc. Wait for call to escalate as high as possible. Some of the teammates would even join online games while supposedly troubleshooting.
Eventually stabilizing our code and environment became top priority instead adding new features. We spent a few months squashing all kind of bugs, added processes like code reviews, unit tests, etc. And after those few months of hard work, our off-hours calls dropped by 90%.
And this is when I learned leadership won't care about our life if they are not impacted by their policies.
Now at my current company I do have on-call but I don't change anything for it. If I am out having dinner with my wife, and I get a work call, I acknowledge it but let my manager know that I will look at the issue in couple of hours. They can get someone else to look at it or wait a few hours. (Also by fighting for quality, we rarely ever get calls after work and management understand why I would spend time with my family during on-call week.)