Hacker News new | ask | show | jobs
by lacker 1991 days ago
Extra compensation or time off is usually a bad idea for on-call responsibilities, because it puts the wrong incentives in place. Teams should be working to improve their infrastructure so that on-call is less painful, not lobbying for additional pay because someone built a hard-to-maintain system.

Sometimes this is an "up the chain" type problem, but if the other engineers on the team don't agree with you that the on-call rotation is too painful, it's going to be hard to convince management that your judgment is correct.

If you don't want to simply switch teams, my suggestion is to think of what engineering work you can do in order to improve the on-call experience. Then propose that you work on these projects, to your manager. Quantify the amount of engineering time and increased reliability your projects will save. In my experience it is far easier to get management to agree to a specific plan to improve the situation than to get management to find someone else to solve a problem for you.

Another idea - since you work at a large company, there are probably teams who handle this very well at your company. Infrastructure teams who have scaled components that in the past have been overloaded and now are widely used within the company, that sort of thing. Try asking for advice in a "horizontal" way, finding experts on other teams and asking how they have solved these issues in their teams. These "horizontal" experts will be able to give advice that's specific to your company. This is especially true if your team is working on a product area and your coworkers are not specialists in making reliable systems, but your company has infrastructure specialists on other teams.

1 comments

> Extra compensation or time off is usually a bad idea for on-call responsibilities, because it puts the wrong incentives in place.

I sort of agree, because yes "just throw a little money at it" is the wrong response. But more money is definitely part of the answer, because unless you negotiated that amount of overtime when you signed up you're not being paid appropriately for your work.

> think of what engineering work you can do in order to improve the on-call experience

This is key. During your incident response review for each incident it's important to also keep a summary of overall incidents so you can use statistics to properly prioritise your engineering effort.

It sounds as though none of that sentence applies to the OP, and none of it ever can. Which means the advice to get out is about all that's left.

In my current job I get a token on call allowance (~2 hours pay a week), and it's expected that I will respond to problems, fix/restart/hack the immediate situation into something that works; then come in during normal work hours and analyse the fault, come up with a plan to stop it happening again; and implement the plan. Note that only the immediate fix is "after hours". Some of the fixes are significant - we're re-writing chunks of C++ code in Rust because there are weird memory issues{tm} in the C++ code (because of course there are). Other fixes are trivial, an assert fires and we say "oh, that can actually happen" and code accordingly.

Right now the on call allowance feels like money for nothing, because we have had two alerts in the last three months but they're paying that allowance to 3 people every week. The boss says "you're doing very well, keep it up" because in his view no problems is a good thing :)