Hacker News new | ask | show | jobs
by devonkim 2443 days ago
High availability Postgres setups are a minimum for a production system and a staging system to understand how your system behaves during a failure event. These failure scenarios should be tested not necessarily on every commit but often enough that there’s confidence during a failover you’re not going to drop queries on the floor and pretend it’s all good as well as your monitoring systems report on the event for the sake of event reporting.
1 comments

Yeah. I guess after having been bitten myself a few times with failed MySQL failovers and especially after having read the GitHub October 2018 incident postmortem [1], I stopped considering failover solutions as a reliable availability solution altogether.

However this is just a personal opinion that I might revisit at some point.

[1] https://github.blog/2018-10-30-oct21-post-incident-analysis/

High availability setups are absolutely required to upgrade / patch running databases as well without significant downtime. The engineering and business costs in time to try to work around these issues are from the 90s and have no place in a modern business environment. Heck, they figured out HA decades before then in commercial, proprietary DBs. Things are much more reliable now with OSS tools than even 4 years ago to the extent few talk about it anymore. There are definitely mistakes and bugs possible but the number of _successful_ failover and failback events must be considered in the calculus.

Upgrades aren’t to be taken lightly of course but again, it’s now a cost of doing business and a reality that we need to engineer properly for.