|
|
|
|
|
by Symbiote
3502 days ago
|
|
Step 0: Agree, on the condition that any alert received outside working hours becomes a priority task for someone the next day -- whether that means them fixing a bug, adjusting the alert, or investigating and explaining why the cause is extremely unlikely to recur. This is working well for me, has improved the service for users, and has made our monitoring system much more useful. (Alerts used to be about as accurate as "Main website broken", but are now more like "microservice X is taking >10s to respond".) |
|