| Having dealt with all of these in production, I can tell you the strategies I've used to combat these things: 1. Solid code reviews. Anyone of our developers can halt a code review for any reason. We require 3 approvers on each review. Sensitive areas require reviews from people familiar in that area. We also have tooling that allows us to generate amounts of test data in dev that is similar to prod loads. This helps us catch a lot of time bombs. 2. Feature toggles to decouple deploy of code from release of code. This allows us to test our code in production before turning it on for customers. It also allows us to slowly rollout a feature and watch how the code behaves. This also gives us a kill switch to turn off the code if it is bad. 3. An incredibly robust testing pipeline. It takes about 50 minutes from commit to production deployment. We can also deploy previous containers very quickly for situations that require it. This doesn't solve all of our problems. Some changes cannot go behind feature toggles (DB migrations, dependency upgrades, etc). But we do pay a lot of attention to design and rollout plans for database migration changes and such. All of these things come at an extra cost to us, but it allows us to move quickly when we need to. But we're in a lot better place than we were when we were trying to do weekly releases. We have a good mix of team experience (sr vs jr) - and have a lot of discipline in our software engineering practices. We still have problems like I said, but these strategies have greatly improved our ability to deliver software. |