|
|
|
|
|
by rozenmd
1099 days ago
|
|
It sounds like your team lacks a culture of continuous improvement - IMO in a product team on-call's full-time job is to make the next on-call engineer's job easier through deleting irrelevant alerts, automating fixes, and generally making the system more stable. I wrote a longer guide about this here: https://onlineornot.com/incident-management/on-call/improvin... |
|
I think there should be a nice light weight tool which should give a clear summary and tracking mechanism which make this a quicker tasks. Even just to tag the runbooks which are not updated. All those notes get lost in documentations and never referred back.