|
|
|
|
|
by rbanffy
2181 days ago
|
|
> If I make a deploy that turns out to be buggy Unless your deploy reconfigures some networking component that makes a large part of your network inaccessible. Then you need to fix the network issue before you can rollback to a previous version. That may require someone driving up to a datacenter and logging into a racked server. And then you may need to restore data if the network misconfiguration caused data to be corrupted somehow (I admit this is getting a bit worst-possible-case-scenario) and, if the data got crossed - that one client could see data from another - you'll need to prevent access until you are sure everything is where it should be. Finally, depending on your scale, the deploy of a new version can take a long time by itself. People often deploy new features deactivated, then, when the whole fleet is updated, activate features to different groups and monitor for breaking behavior change. |
|
Even better - then they shouldn't even need to make another deploy, just flip the feature flag back off. And if you need to make network changes, then test those out behind a load balancer in parallel to the existing topology, so you can start routing more traffic to the new setup, but can stop doing so if any problems arise. I'm not saying any of this is trivial, but the point is, best practices exist to start deploying pretty much any kind of change in a way that can be undone in minutes or even seconds. When you have access to the resources and talent that Github has, then there's zero acceptable reasons why your site would ever be down or degraded for hours on end - zero.