Hacker News new | ask | show | jobs
by nelsondev 1326 days ago
At Amazon, near Black Friday and Prime Day, there are company wide deployment freezes, where no one is allowed to push to prod.

When I was oncall for my team, I found there were less pages, less issues, and the system was generally more stable.

Entropy, leading to availability problems, grows with rate of production changes.

If no one touches the code, my guess is the system is more stable rather than less.

2 comments

Of course this is true, and isn't really a surprise. Very few outages are caused by existing code in a system that was otherwise working perfectly. It's almost always due to some change – whether a bug in newly deployed code, bad config update or whatever else. An untouched system is absolutely more stable than one in flux.

Of course the solution can't really be "let's not deploy anything, just to be safe", because then your competitors are going to launch new features and leave your product behind.

I think Uber published a study about this. Deployments and changes cause most issues.