|
|
|
|
|
by dcurtis
6413 days ago
|
|
This thinking has always kind of confused me. Why are customers/users at a big company more important than those at a startup? Just because there's more of them, now you can't make mistakes? If you have the agility to make rapid production changes, you also have the ability to rapidly rollback. So the argument that larger companies require more checks and testing than startups isn't really valid, especially when you consider the costs. |
|
This is just not true. Rollbacks are always more expensive than changes, because you can't rewind time to undo the consequences of having your software be broken for minutes, hours, or days. Worse, in the absence of "checks", the cost of making a production change tends to be roughly constant as the company grows -- it takes the Amazon sysadmin no more time to type "make deploy" than it does me -- but the cost of a rollback scales directly with the size of your company's customer base.
Within a few seconds after Amazon.com breaks S3, thousands of companies begin to lose money, and they lose money second by second until the rollback happens. Even if Amazon is only down for a minute, that's one minute of downtime multiplied by its number of customers. The larger the customer base, the larger the stakes.
And, unfortunately, the cost of downtime is nonlinear. If Amazon goes down for a mere two minutes, hundreds of peacefully sleeping system administrators will get emergency pages from their uptime-monitoring systems. They will get out of bed. They will check their logs and their failover mechanisms. They will lose a lot of sleep, and soak up a bunch of overtime pay, and a lot of their good will towards Amazon will dissipate like the morning dew. Once you lose your reputation for quality it takes a lot of work to get it back.
This is why larger companies have more controls. The controls are in place to try and pass the ever-increasing cost of a rollback back to the team that causes the rollbacks. The reason it seems so gosh-darned expensive to add a trivial feature to your flagship app is that it is expensive: If the average rollback costs $1m in revenue and every new feature is only 95% reliable, every new feature costs the company $50k to deploy.
The secret here is: If you want to deploy changes rapidly, don't work on a product that has a lot of uptime-sensitive customers! Start a different product line, or start a beta program, or found a smaller company.