Hacker News new | ask | show | jobs
by kylek 2456 days ago
Worked at a FAANG, 5-7 was peanuts for the rotation I was on there. The interesting thing (I don't know if I liked it or not) was that when you're on call, that's all you do (even during normal hours, that is), no "normal" work/projects during that time (which relieves a giant burden for everyone NOT on call). At the end of the rotation, there is a proper hand-off to the next on call; every issue that came up is reviewed and a plan put in place to fix it "for good" (meaning a backlog task gets created and assigned to someone during the next sprint planning). If there's no planning to root-cause and fix the underlying problems, run.
2 comments

This is super interesting. So you had a high number of pages, but then you also had a really clearly defined and sensible sounding way of dealing with the root causes of the pages?

If you're constantly fixing the things causing you to get pages, why are there still many more than one per day? Just prioritisation of other work over fixes?

We have a similar system, though we have one person on after hours support doing normal work during the day, and one person during the day who doesn't do normal work. That person works on remediating the issues that cause people to get paged. Leads to a pretty low number of pages.

My rotation was a bit weird. I was on an ops team for a service, but my ops team did not have our own rotation- each of us took part in the various dev team rotations (the theory is nice, the ops team had a deep view of most aspects of the service. I don't think this was common to other service teams). The dev team I took part in was an absolute trainwreck. Poorly managed at the team level and one level above (the owners/managers of the service). More concerned with getting features out and burning through people to make progress. The issues were always brought up and root-caused properly, but poor architecture led to a lot of "well, we can't do that until x happens". I should reiterate that I'm no longer at the company - definitely wasn't the place for me (and my sanity)!
Thanks - I'm glad I got at least one reply of someone who's confirming that level of paging wasn't abnormally high to them.

When on primary on-call, we also are generally not expected to make progress on project work, although we don't have reviewing of all our incidents after our shift (generally just major ones.) I think there's definitely room for improvement here.