Hacker News new | ask | show | jobs
by chtitux 1839 days ago
If you can afford a one off 1 second of latency for your SQL queries, then using logical replication with pgbouncer seems way easier :

- setup logical replication between the old and the new server (limitations exist on what is replicated, read the docs)

- PAUSE the pgbouncer (virtual) database. Your app will hang, but not disconnect from pgbouncer

- Copy the sequences from the old to new server. Sequences are not replicated with logical replication

- RESUME the pgbouncer virtual database.

You're done. If everything is automated, your app will see a temporary increase of the SQL latency. But they will keep their TCP connections, so virtually no outage.

8 comments

This works wonderfully. If you have any long running queries, though, the PAUSE won't pause until they have finished.

I love pgbouncer, it is such a great tool.

this is interesting. So you have a weakest link kind of problem here.
You can temporarily reduce query timeout to a smaller setting as part of the automated failover. The long running transactions will fail but you can minimize the window where you can't talk to postgres
Not really, new connections will block as it's pausing. But you won't be able to shut down Postgres until those long queries complete. Perhaps I was not super clear, but what I'm trying to say is that PAUSE is not instantaneous.
yeah what I'm saying is that you can only pause as fast as your slowest currently initiated query. So if you have a diverse set of query patterns, you could be waiting for a really small percentage of small queries to wrap up.
I understand now. Yes, this is true and should be taken into consideration during upgrades.

Most of our queries are fast, and the longer ones hit a read-only secondary instance. Cleanup jobs, though, can take hours to complete.

To be fair about this page, this was used to migrate versions of postgres __prior__ to the introduction of logical replication. Logical replication makes this significantly easier (ie you no longer need the triggers)
Exactly this. The OP’s approach reminded me so much of the days of Slony, and I wondered why a simpler approach with logical replication would not just suffice.

Rather than pgbouncer, I did this in the actual application code once (write to both databases at the same time, once everything is in sync and you’re confident the new server works well, fail over to the new one only), but it depends upon how much control you can exercise over the application code.

Any approach that is based on triggers makes me shiver, however.

This is precisely the migration I'm planning on doing in the next few weeks with pglogical under the hood for replication. Seems like the atomic switch is much easier than any sort of problem that could stem from conflict or data duplication errors while in a bi-directional replication strategy.
Yep, you can also prepare the new database by using a snapshot of the primary's volume, and use pg-rewind to get them in sync. Altogether the tooling can make migrations super easy without minimal downtime.
I use pgbouncer and had no idea it supported logical replication. I cant find anything about it in the docs. Do you have something you can link me to to read more?
I’m assuming they mean to use Postgres logical replication.
Which is only possible if you are using a version of postgres which is new enough, and isn't restricted, such as some versions of RDS. Which, explains the whole original post.
Does PAUSE works in session mode? I have always had challenges with PAUSE in session mode.
That's how you do it.