Internal to Amazon we consider that about 80% of issues/outages/etc are due to changes. This may sound "duh" but this is over 10k plus investigations.
Much if the work is just minimizing the impact of this changes by finding them before customers do.
This includes things like unit, integration testing, canaries, cellular/ zonal / regional deploys, auto rollbacks, multi-hour bakes, auto load tests, and much much monitoring. Not to mention cross team code reviews, game days, ops reviews.
No, I think it's more part of how to run a complex system with a lot of people changing stuff at once. Having good monitoring, kill switches, staged rollout, continuous deployment, and so on are all things that contribute more making a reliable service than how microserviced it is.
Much if the work is just minimizing the impact of this changes by finding them before customers do.
This includes things like unit, integration testing, canaries, cellular/ zonal / regional deploys, auto rollbacks, multi-hour bakes, auto load tests, and much much monitoring. Not to mention cross team code reviews, game days, ops reviews.