|
|
|
|
|
by jorgeortiz85
3909 days ago
|
|
Yup. This is one of our few remaining unsharded databases (legacy problems...), so we can't easily canary a fraction of serving capacity. However, one clear remediation we can implement easily is to have our tooling change a replica first, failover to it as primary, and, if problems are detected, quickly fail back to the healthy former primary. Lesson learned. We'll be doing a review of all of our database tooling to make sure changes are always canaried or easily reversible. |
|