Hacker News new | ask | show | jobs
by aeyes 925 days ago
As long as you have the correct LSN there is no way for this to go wrong.

If you resume replication with an incorrect LSN replication will break immediately. I have spent way too much time trying to do this on my own before the blog post was written and I have seen it fail over and over again.

To give you more confidence, try with the LSN from the "redo starts at" log message. It looks close but it will always fail.

1 comments

Sadly this isn't true. Postgres will happily replicate and skip data if you tell it too.

And there have been multiple bugs around logical replication in version ~10-15 that can cause data loss. None of these are directly related to lsn fiddling tho.

Indeed - People In The Know have some cocnerns with this approach: https://ardentperf.com/2021/07/26/postgresql-logical-replica... .

At $work we did use this approach to upgrade a large, high throughput PG database, but to mitigate the risk we did a full checksum of the tables. This worked something like:

    * Set up logical replica, via 'instacart' approach
    * Attach physical replicas to the primary instance and the logical replica, wait for catchup
    * (very) briefly pause writes on the primary, and confirm catchup on the physical replicas
    * pause log replay on the physical replicas
    * resume writes on the primary
    * checksum the data in each physical replica, and compare
This approach required <1s write downtime on the primary for a very comprehensive data validation.