Hacker News new | ask | show | jobs
by rozenmd 637 days ago
I've written a few guides on this. Some quick pointers:

- You build it, you run it

If your team wrote the code, your team ensures the code keeps running.

- Continuously improve your on-call experience

Your on-call staff shouldn't be on feature work during their shift. Their job is to improve the on-call experience while not responding to alerts.

- Good processes make a good on-call experience

In short, keep and maintain runbooks/standard operating procedures

- Have a primary on-call, and a secondary on-call

If your team is big enough, having a secondary on-call (essentially, someone responding to alerts only during business hours) can help train up newbies, and improve the on-call experience even faster.

- Handover between your on-call engineers

A regular mid-week meeting to pass the baton to the next team member ensures ongoing investigations continue, and that nothing falls between the cracks.

- Pay your staff

On-call is additional work, pay your staff for it (in some jurisdictions, you are legally required to).

More: https://onlineornot.com/incident-management/on-call/improvin...