Hacker News new | ask | show | jobs
by xkcd-sucks 2247 days ago
In an ideal world, pushing new features would have no impact on stable mature features like browsing files, comment threads, etc
5 comments

Internal to Amazon we consider that about 80% of issues/outages/etc are due to changes. This may sound "duh" but this is over 10k plus investigations.

Much if the work is just minimizing the impact of this changes by finding them before customers do.

This includes things like unit, integration testing, canaries, cellular/ zonal / regional deploys, auto rollbacks, multi-hour bakes, auto load tests, and much much monitoring. Not to mention cross team code reviews, game days, ops reviews.

Ideal I agree ... and yet the real world is exactly the opposite ;)
That’s part of the micro service promised land, right?
No, I think it's more part of how to run a complex system with a lot of people changing stuff at once. Having good monitoring, kill switches, staged rollout, continuous deployment, and so on are all things that contribute more making a reliable service than how microserviced it is.
Too bad this is the real world.
Which world is that?