Hacker News new | ask | show | jobs
by sholladay 637 days ago
I would first ask the question, “Do you really need high uptime at night?” I’ve seen too many small startups whose product is about as critical as serving cat pictures and with most customers in a nearby time zone do on-call. That’s unreasonable unless, maybe, your pay for such a role is equally ridiculous (high) and clear at the time of hiring. Don’t talk existing engineers into it, show them the terms and have them volunteer.

As for the schedule, I would recommend each engineer have a 3-night shift and then a break for a couple of weeks. Ideally, they will self-assign to certain slots. Early in the week/month might be better/worse for different people.

I strongly suggest that engineers not work on ops engineering or past on-call issues while they themselves are on-call, otherwise there is a very strong incentive for them to reduce alerts, raise thresholds, and generally make the system more opaque. All such work should be done between on-call shifts, or better yet, by engineers who are never on-call.

One way that on-call engineers can contribute when there is no current incident ongoing is to write documentation. Work on runbooks. What to do when certain types of errors occur. What to do for disaster recovery.