| (Throwaway account.) I work for a large software company on an on-call rotation that’s been getting more toilsome, and wondering if anyone has been in a similar place. Like many SV companies, on-call isn’t compensated with the rationale that it’s part of your engineering duties. I buy this to some degree—someone does have to be keeping an eye on things—but it's complicated by sizable inequities across the org. _Most_ people have no on-call rotation, many others have a token rotation that’s ~never used, and only a handful of teams have rotations that are quite bad. Management has extricated themselves completely. Things have been angling slowly worse. In a gambit to prioritize uptime over engineer time, we have more alarms, tighter tolerances, and a larger operation that generates more tail problems. Good for users, but not so good for us. Being able to sleep fully through the night is increasingly rare. There are some false positives, but most are not, and not easily fixed by more engineering. Expected time to response has lowered to low single digits—theoretically, you should not be exercising or driving if you’re on. The scheme works because many engineers are in their 20s and willing to soak up pain like a sponge. Rotations tend to smaller over time as single people make backroom deals to get out, and new blood is added too slowly. I’m not trying to get myself out, but want to effect some kind of change. IMO compensation or extra time off would be ideal—not only is it a nod to the cost of on-call, but it also make exchanging shifts easier by adding incentive beyond simple goodwill. The company could easily afford it, but probably doesn’t want to pay for what it can get for free. I have frequent conversations with my manager and get token “yeah, we’re looking into it”s, but it’s obviously not a priority for anyone up the chain. Has anyone else been in a similar position? Are you paid? What did you do? Suck it up? Leave? |
Have you had a conversation with your skip-level manager? If so, then you are probably right that it's not valued up the chain and you should leave because that is a total shit show that is not the norm.
If you haven't, reach out for time on their calendar, and write down your data points on on-call wake-up rates, total of alarms over time, and let the data make the point that this is not sustainable.
The Director should have some options. How big is the rotation? Is the manager in the rotation themselves? When you're on call are you also expected to contribute story points to the sprint? Why are you not able to solve underlying engineering issues that are causing the SLO violations?
If you came to me, I would be shocked, and immediately make a plan with the engineering manager. Any time a person is woken-up by an alarm it's an incident. There needs to be a response to every incident. There needs to be some serious bar-raising and you can't do it yourself. You need an ally in your management chain and if you don't have one, you're better off transferring teams or companies.