| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aeyes 925 days ago

There is a better way than fully copying table content one by one which is very I/O heavy and will not work if you have very large tables.

You can create a replication slot, take a snapshot, restore the snapshot to a new instance, advance the LSN and replicate from there - boom, you have a logical replica with all the data. Then you upgrade your logical replica.

This article from Instacart shows how to do it: https://archive.ph/K5ZuJ

If I remember correctly the article has some small errors but I haven't done this in a while and I don't exactly remember what was wrong. But in general the process works, I have done it like this several times upgrading TB-sized instances.

5 comments

samokhvalov 925 days ago

> You can create a replication slot, take a snapshot, restore the snapshot to a new instance, advance the LSN and replicate from there - boom, you have a logical replica with all the data. Then you upgrade your logical replica.

This is a great recipe but needs small but important correction. We need to be careful with plugging pg_upgrade in this physical-to-logical replica conversion process: if we first start logical replication and then running pg_upgrade, there are risks of corruption – see discussion in pgsql-hackers https://www.postgresql.org/message-id/flat/20230217075433.u5.... To solve this, we need first to create logical slot, advance the new cluster to slot's LSN position (not starting logical replication yet), then run pg_upgrade, and only then logical replication – when the new cluster is already running on new PG version.

This is exactly how we (Postgres.ai) recently have helped GitLab upgrade multiple multi-TiB clusters under heavy load without any downtime at all (also involving PgBouncer's PAUSE/RESUME) - there will be a talk by Alexander Sosna presented later this week https://www.postgresql.eu/events/pgconfeu2023/schedule/sessi... and there are some plans to publish details about it.

link

aeyes 924 days ago

Thank you for linking this insightful discussion.

I am not sure why I never ran into this problem, unfortunately I don't have access to my notes anymore because I no longer work on this.

This approach has solved so many problems for me. I can do full vacuum, I can change integer columns to bigint, I can do major version upgrades, I can even move instances across AWS regions all with minimal downtime.

It's really great to see that people continue to tinker with it and that there are active discussions on the mailing list to keep improving logical replication. It's come a long way since the first implementation. Thanks for your contribution!

link

brentjanderson 925 days ago

OP here: we looked at this and were not confident in manually advancing the LSN as proposed, and detecting any inconsistency if we missed any replication as a result. Table by table seemed more reliable, despite being more painstaking.

link

aeyes 925 days ago

As long as you have the correct LSN there is no way for this to go wrong.

If you resume replication with an incorrect LSN replication will break immediately. I have spent way too much time trying to do this on my own before the blog post was written and I have seen it fail over and over again.

To give you more confidence, try with the LSN from the "redo starts at" log message. It looks close but it will always fail.

link

mmontagna9 925 days ago

Sadly this isn't true. Postgres will happily replicate and skip data if you tell it too.

And there have been multiple bugs around logical replication in version ~10-15 that can cause data loss. None of these are directly related to lsn fiddling tho.

link

AlisdairO 924 days ago

Indeed - People In The Know have some cocnerns with this approach: https://ardentperf.com/2021/07/26/postgresql-logical-replica... .

At $work we did use this approach to upgrade a large, high throughput PG database, but to mitigate the risk we did a full checksum of the tables. This worked something like:

    * Set up logical replica, via 'instacart' approach
    * Attach physical replicas to the primary instance and the logical replica, wait for catchup
    * (very) briefly pause writes on the primary, and confirm catchup on the physical replicas
    * pause log replay on the physical replicas
    * resume writes on the primary
    * checksum the data in each physical replica, and compare

This approach required <1s write downtime on the primary for a very comprehensive data validation.

link

vittore 924 days ago

We have updated article https://tech.instacart.com/zero-downtime-postgresql-cutovers...

link

imbradn 924 days ago

That article covers the basis of how we do upgrades at Instacart, but is quite old. This is a more modern look at how we accomplish the process. We have used this process to upgrade a lot of very large and very active databases successfully.

https://www.instacart.com/company/how-its-made/zero-downtime...

link

mmontagna9 925 days ago

You caught our off by one bug :)

link