|
|
|
|
|
by cheald
615 days ago
|
|
+1 to all of this. The thing I'd add is that we use barman for our additional replicas; WAL streaming is very easy to do with Barman, and we stream to two backups (one onsite, one offsite). The only real costs are bandwidth and disk space, both of which are cheap. Compared to running a full replica (with its RAM costs), it's a very economical way to have a robust disaster recovery plan. If you're doing manual failover, you don't need an odd number of nodes in the cluster (since you aren't looking for quorum to automatically resolve split-brain like you would be with tools Elasticsearch or redis-sentinel), so for us it's just a question of "how long does it take to get back online if we lose the primary" (answer: as long as it takes to determine that we need to do a switch and invoke repmgr switchover), and "how robust are we against catastrophic failure" (answer: we can recover our DB from a very-close-to-live barman backup from the same DC, or from an offsite DC if the primary DC got hit by an airplane or something). |
|