| We have a pretty tight oncall (5 min response time). I think the steps you can take are: 1. Make it clear to your manager this is unacceptable, and you will end up looking for alternate teams/jobs if this goes on 2. Make the same thing clear to your skip level 3. Quit / change teams, citing oncall as the issue There's no point of doing anything else, in my experience. It's someone else's job to make sure that your oncall experience is prioritized. It sucks to leave an otherwise good job. For extra credits - try to propose some solutions. Why are some issues not solvable by engineering? Would simply resetting expectations mitigate the largest issues/waking up at night? |
> Make it clear to your manager this is unacceptable, and you will end up looking for alternate teams/jobs if this goes on
I'm trying to do this in as harmonious way as I possibly can, but I'm a bit worried that getting really contentious about it might have negative repercussions. It's possible that I'd "win" and allowances would be made, but it's also possible I'd end up making some real enemies and/or put on a track out the door.
One hopefully-unusual circumstance here is that most of the rest of my team (and in fact the company) either don't mind the situation much, or at least aren't openly vocal about it, which makes me look like that one nail hanging out that's ready to be slammed back down.
> Quit / change teams, citing oncall as the issue
This is probably the inevitable solution unfortunately, although I will feel bad exiting (making the rotation even smaller) and without having moved anything in the right direction.
> Why are some issues not solvable by engineering? Would simply resetting expectations mitigate the largest issues/waking up at night?
Yeah, agreed. This is the obvious way out if at all possible, but there are many types of alarms where it's fairly difficult. For example: (1) cases where there is a big problem and we get paged essentially as a side effect of one failure causing issues in our part of the system, or (2) catch-all alarms designed to page when something looks suspicious enough to merit human attention, even if not a known failure case. There's a strong attitude of err-on-potential-issues, so relaxing any of these tends to be a no-go politically.