Hacker News new | ask | show | jobs
by samokhvalov 925 days ago
> You can create a replication slot, take a snapshot, restore the snapshot to a new instance, advance the LSN and replicate from there - boom, you have a logical replica with all the data. Then you upgrade your logical replica.

This is a great recipe but needs small but important correction. We need to be careful with plugging pg_upgrade in this physical-to-logical replica conversion process: if we first start logical replication and then running pg_upgrade, there are risks of corruption – see discussion in pgsql-hackers https://www.postgresql.org/message-id/flat/20230217075433.u5.... To solve this, we need first to create logical slot, advance the new cluster to slot's LSN position (not starting logical replication yet), then run pg_upgrade, and only then logical replication – when the new cluster is already running on new PG version.

This is exactly how we (Postgres.ai) recently have helped GitLab upgrade multiple multi-TiB clusters under heavy load without any downtime at all (also involving PgBouncer's PAUSE/RESUME) - there will be a talk by Alexander Sosna presented later this week https://www.postgresql.eu/events/pgconfeu2023/schedule/sessi... and there are some plans to publish details about it.

1 comments

Thank you for linking this insightful discussion.

I am not sure why I never ran into this problem, unfortunately I don't have access to my notes anymore because I no longer work on this.

This approach has solved so many problems for me. I can do full vacuum, I can change integer columns to bigint, I can do major version upgrades, I can even move instances across AWS regions all with minimal downtime.

It's really great to see that people continue to tinker with it and that there are active discussions on the mailing list to keep improving logical replication. It's come a long way since the first implementation. Thanks for your contribution!