|
|
|
|
|
by sulam
2217 days ago
|
|
I guess it depends on your definition of widespread. It is relatively common, but it's also far from universal, and there are companies that use on call which have a vastly different experience that what many people at Amazon report. In fact it seems like even different teams inside of Amazon have drastically different experience of on call. Treating it like a fact of life, like the weather, leaves you in a frame of mind that makes it impossible to advocate for change. Your own phrasing "be the change you want to see in the world" could as easily apply to on call as much as it does anything else -- but you won't take that stance if you think it's inescapable. |
|
to give you an example. the team i was in at amazon had 3000 tickets in the queue when i started. anything except sev2s were basically ignored. lower severity tickets would escalate when shit hit the fan. i advocated for fixing classes of issues instead of myopically focusing on one-offs. by the time i left the queue was tens tickets and mostly feature request or higher level investigations.
to give you another example: i would basically remove all alerting that was not actionable. the worst possible thing that you can do is wake up in the middle of the night and not be able to do anything. i would ask for runbooks and the test was “if i take a developer from another team and put them oncall can they function independently 95% of the time”. i would think about what the experience of being oncall was (ie you don’t take people and throw them in the deep end of the pool and wonder why they drown)
so i guess what i’m saying is that oncall for me wasn’t that bad or stressful. it sucks having to be near a computer but I was rarely paged for stuff that broke or needed to be fixed right NOW. (once stabilized our team had 1 sev2 every other week)