| Hi Folks, Being on-call has been one of the most painful part of my job as a software engineer now a days. There were a lot of stressful weeks I had spend with completely demotivated about how much time I have been spending on these issues which can be spent on the innovation. So I have listed my top issues in below ranks. I was wondering if others feel the same pain? I also wonder why can there be a solution built for these? 1. I do not have enough information in alert to jump right on the resolution 2. It’s not easy to find similar alerts triggered recently so that I can go back and find how they were fixed? 3. I don’t find runbooks useful most of the times as they are not up to date 4. I don’t know if there were any recently merged changes which caused these alerts/incidents 5. A lot of the time, I don’t know whom to reach out to if this alert is from other team. 6. I have to go to multiple systems to update the statuses or notes 7. I have to summarize all the details again as a part of on-call handoff summary doc at the end of the rotation |
There's another problem (#8 to add to the list) I also felt pain from: how you're scheduled to work oncall. We had ad-hoc manual scheduling of who would work oncall when. A tool for solving that is https://oncallscheduler.com (which I am affiliated with). It automates the oncall scheduling, while making it fair, predictable, and gives all engineers self-service control over when and how they're scheduled. I'd love some feedback on it.