Hacker News new | ask | show | jobs
by rantwasp 2218 days ago
hmm. no. you do have agency. you have agency during writing the code. you have agency during the code review. you have agency during wiring of alerts (if you do stupid shit you will receive a lot of pages. don’t so stupid shit). you have agency during the events. you also have agency after the events when you do a coe/post mortem. you can defined prioritize things to improve the life quality of other developers.

to give you an example. the team i was in at amazon had 3000 tickets in the queue when i started. anything except sev2s were basically ignored. lower severity tickets would escalate when shit hit the fan. i advocated for fixing classes of issues instead of myopically focusing on one-offs. by the time i left the queue was tens tickets and mostly feature request or higher level investigations.

to give you another example: i would basically remove all alerting that was not actionable. the worst possible thing that you can do is wake up in the middle of the night and not be able to do anything. i would ask for runbooks and the test was “if i take a developer from another team and put them oncall can they function independently 95% of the time”. i would think about what the experience of being oncall was (ie you don’t take people and throw them in the deep end of the pool and wonder why they drown)

so i guess what i’m saying is that oncall for me wasn’t that bad or stressful. it sucks having to be near a computer but I was rarely paged for stuff that broke or needed to be fixed right NOW. (once stabilized our team had 1 sev2 every other week)

1 comments

Okay, at the risk of putting way too much effort into this, I'm going to assume you have 5 people on your team, therefore you're on call 10 times a year roughly. 5 weeks out of the year then you'd be oncall during a sev2. Assuming sev2's are uniformly distributed you are probably interrupted outside normal working hours ~80% of the time, so call it 4 times a year you have your work-life balance negatively affected by on-call.
you need to generate some 3d graphs and after that write a paper about using gradient descent to improve the WLB of SW developers :)

our team had 7 people and at some point we started sharing the oncall load with a sister team (ie you were oncall for the services on both teams), meaning that you would be oncall roughly once every 3 months. not ideal - but not the worst thing in the world.