Hacker News new | ask | show | jobs
by wsh 2456 days ago
I wouldn’t accept that as normal. In well-run organizations, when there is a regular, ongoing need for evening or overnight coverage, it’s provided by people scheduled to work during those hours, who are selected and trained to be able to handle most situations on their own.

After-hours calls should come infrequently, or in situations where someone’s personal involvement (for example, as the engineer with primary responsibility for a particular component or its maintenance) is indispensable.

In my experience, things that need a lot of unplanned attention are more likely to fail, if they haven’t already, in ways that have other unacceptable consequences. Fixing them should be a priority for this reason, too.

You haven’t mentioned why you keep getting paged. Is it the same problem repeatedly, or lots of different problems? Is there any hope of addressing the underlying causes?

2 comments

>In well-run organizations, when there is a regular, ongoing need for evening or overnight coverage, it’s provided by people scheduled to work during those hours, who are selected and trained to be able to handle most situations on their own.

It's decently common to have engineering teams oncall for their own services, with a regular PagerDuty shift as part of the job. In that case 5-7 alerts per week is pretty healthy. It sucks that you need to keep your work laptop with you and stay sober / within cell coverage, but even then it's pretty rare to catch an actual outage that requires significant attention.

Thanks for your reply - we do have some recurring types of issues, but I'd say it varies a fair bit. A lot of issues are customer support related (that require administrator access to fix), but there are a lot of system issues as well. All are deemed as items that need fixing after hours (even if I don't necessarily agree with that assessment.)

There are actions being taken to fix both the number of customer support cases as well as the systems issues - but progress is slow, and our appetite to implement all of our customer requested changes end up adding lots of new problems.