Hacker News new | ask | show | jobs
by mikkergp 1317 days ago
Yes, but complex systems are always changing and in some ways in a constant process of degrading. A lot of the biggest companies are growing exponentially faster than their processes and it ends up being nearly impossible for the tooling and supporting software to keep up. At that scale all the automation software you buy off the shelf won't scale with you. With over a billion dollars a year in surplus infrastructure costs, I would have to imagine Twitter is at that scale.
1 comments

At Amazon, near Black Friday and Prime Day, there are company wide deployment freezes, where no one is allowed to push to prod.

When I was oncall for my team, I found there were less pages, less issues, and the system was generally more stable.

Entropy, leading to availability problems, grows with rate of production changes.

If no one touches the code, my guess is the system is more stable rather than less.

Of course this is true, and isn't really a surprise. Very few outages are caused by existing code in a system that was otherwise working perfectly. It's almost always due to some change – whether a bug in newly deployed code, bad config update or whatever else. An untouched system is absolutely more stable than one in flux.

Of course the solution can't really be "let's not deploy anything, just to be safe", because then your competitors are going to launch new features and leave your product behind.

I think Uber published a study about this. Deployments and changes cause most issues.