|
|
|
|
|
by grogenaut
2247 days ago
|
|
Internal to Amazon we consider that about 80% of issues/outages/etc are due to changes. This may sound "duh" but this is over 10k plus investigations. Much if the work is just minimizing the impact of this changes by finding them before customers do. This includes things like unit, integration testing, canaries, cellular/ zonal / regional deploys, auto rollbacks, multi-hour bakes, auto load tests, and much much monitoring. Not to mention cross team code reviews, game days, ops reviews. |
|