|
|
|
|
|
by daxfohl
103 days ago
|
|
You could create an agent template for each incident you've ever had, with context pre-cached with the postmortem report, full code change, and any other information about the incident. Then for every new PR you could clone agents from all those templates and ask whether the PR could cause something similar to the pre-loaded incident. If any of them say yes, reject the PR unless there's a manual override. You'd never have a repeat incident. Obviously it's probably cost-prohibitive to do an all to all analysis for every PR, but I imagine with some intelligent optimizations around likelihood and similarity analysis something along those lines would be possible and practical. |
|
COEs and Operation Readiness Reviews are already the documents that you mention, but they are largely useless in preventing incidents.