|
|
|
|
|
by Silhouette
2540 days ago
|
|
In that case you would probably still roll back to prevent further data corruption and restore the corrupted records from backups. OK, but then what if it's new data being stored in real time, so there isn't any previous backup with the data in the intended form? In this case, we're talking about Stripe, which is presumably processing a high volume of financial transactions even in just a few minutes. Obviously there is no good option if your choice is between preventing some or all of your new transactions or losing data about some of your previous transactions, but it doesn't seem unreasonable to do at least some cursory checking about whether you're about to cause the latter effect before you roll back. |
|
Rollbacks should always be safe. They should always be automatically tested. So a software release should do a gradual rollout (ie. 1, 10, 100, 1000 servers), but it should also restart a few servers with the old software version just to check a rollback still works.
The rollout should fail if health checks (including checking business metrics like conversion rates) on the new release or old release fails.
If only the new release fails, a rollback should be initiated automatically.
If only the old release fails, the system is in a fragile but still working state for a human to decide what to do.