|
|
|
|
|
by ipnon
631 days ago
|
|
The simplest solution is to compensate the on-call engineer, either by paying them 2 times their hourly rate per hour on-call, or by accruing them an hour of vacation time per hour on-call. This works because it incentivizes all parties to minimize the amount of time spent in on-call alert. Management is incentivized to minimize time spent in alert because it is now cheaper to fix the root-cause issues instead of having engineers play firefighter on weekends. Long-term, which is the always the only relevant timeline, this saves money by reducing engineer burnout and churn. Engineers are also incentivized to self-organize. Those who have more free time or are seeking more compensation can volunteer for more on-call. Those who have more strict obligations outside of work thus can spend less time on alert, or ideally none at all. In this scenario, even if the root cause is never addressed, usually the local "hero" quickly becomes so inundated with money and vacation time that everyone is happy anyway. It doesn't completely eliminate the need for on-call or the headaches that alerts inevitably induce but it helps align seemingly opposing parties in a constructive manner. Thanks to Will Larson for suggesting this solution in his book "An Elegant Puzzle." |
|
I'm not sure we're all on the same page here but let me give you an example of how on-call essentially works on my team.
- Week long rotations spread out across the year among members.
- On-call means holding a pager but also taking in any non-urgent requests that can be handled within a reasonable time. New feature requests are out of scope, answering a bug report from support is in scope, including a fix if that's possible.
- Responding to paging alerts only at night. On some teams we did have sister teams in other regions to cover with their on-call over some portion of the night.
- Generally, paging alerts are rare enough (once or twice a week) so out of work hours disruption is fairly low.
- Non-urgent breakages, bug reports, etc. are fairly common though.
Someone has to handle all that so it's a rotation. I don't think providing incentives to engineers to take more on-call is practical. Unless you are okay with them stagnating in their career. And it's the EM asking here so I'd hope they didn't want that.