|
|
|
|
|
by lbriner
615 days ago
|
|
> How do I check replica lagging? I use the prometheus exporter for postgres > How would I monitor the replica? Same. You can also use something like HA proxy calling a postgres CLI command to connect to the instance > How do I failover? Mostly, you probably want to do this manually because there can be data loss and you want to make sure the risk is worth it. I simply use repmgr for this. > Do I need 2 replicas? It's usually good to have at least 3 (1 master and 2 slaves) but mostly so that if one fails, you still have 2 remaining i.e. time to get a 3rd back online > How do I failback? Again, very easy with repmgr, you just tell the primary to be the primary again. The failed over primary gets stopped, the original primary gets fast-forwarded and promoted to primary and everything else gets told to follow. I do agree that this space for postgres is very fragmented and some tools appear abandoned but its pretty straight-forward with just postgres + barman + repmgr, I have a series of vides on YouTube if you are interested but I am not a Postgres expert so please no hating :-) https://youtu.be/YM41mLZQxzE |
|
If you're doing manual failover, you don't need an odd number of nodes in the cluster (since you aren't looking for quorum to automatically resolve split-brain like you would be with tools Elasticsearch or redis-sentinel), so for us it's just a question of "how long does it take to get back online if we lose the primary" (answer: as long as it takes to determine that we need to do a switch and invoke repmgr switchover), and "how robust are we against catastrophic failure" (answer: we can recover our DB from a very-close-to-live barman backup from the same DC, or from an offsite DC if the primary DC got hit by an airplane or something).