|
|
|
|
|
by caw
4702 days ago
|
|
Megacorp sysadmin here - we do on-call for weekly rotations, though technically anyone can get woken up for the service they own. Weekly is easy to schedule, and it lets our boss know who the contact is for the week (since the schedule is on the wiki). Never page if it's not an absolute dire emergency. One server out of a cluster - Next Business Day. Failed disk - NBD, unless you're out of hot spares. As much of your work as possible should be automated to fix it without you having to touch anything. Service down? Try restarting it. Still down? Maybe then consider an email or page. Other stuff +Monthly or quarterly sync up meetings between all pager people. Doubly so during super critical times for the business to ensure stability. +Single email list/PDL for the on-call (+ manager) so they can communicate about issues, as well as be cc'd on vendor support tickets (helps with hand offs) +FAQ for your services so you don't have to wake the DBA or web admin until you know it's really hosed. +(Sounds silly, but bears mentioning) During pager hand-off, last week's guy and this week's guy should talk about what happened and if there's anything they should know |
|
Agreed, we were thinking of doing week long rotations (Tuesday - Tuesday) with a "hand off conversation" happening on Tuesdays.