Hacker News new | ask | show | jobs
by TechIsCool 915 days ago
With the mention of AWS RDS and Aurora, I am curious if you had thought about creating a replication slot, adding a replica to the cluster and then promoting the replica to its own cluster. Then connecting the new cluster to the original with the replication slot based on the position of the snapshot. This would save the large original replication time and also keep the sequences consistent without manual intervention.
1 comments

That's a very interesting approach, I'm not sure if the sequences would remain consistent under that model or not. AWS RDS Aurora also requires you to drop replication slots when performing version upgrades, so we would unfortunately have lost the LSNs for replication slots that we use to synchronize with other services (e.g. data warehouse).

I'd look into it more next time if it weren't for the fact that AWS now supports Blue/Green upgrades on Aurora for our version of Postgres. But, it's an interesting approach for sure.

Yeah its been nice to leverage this while working on some of our larger multi TB non-partitioned clusters. We have seen snapshots restore in under 10 minutes across AWS Accounts (same region) as long as you already have one snapshot shipped with the same KMS keys. We have been upgrading DBs to lift out of RDS into Aurora Serverless.

If anyone here knows how to get LSN numbers after an upgrade/cluster replacement. I would love to hear about it since its always painful to get Debezium reconnected when a cluster dies.

I looked at getting LSN numbers after an upgrade/cluster replacement, and IIRC restoring from a snapshot emits LSN information into the logs, but it's a bit of of a mixed bag as to whether or not you get the __right__ LSN out the other side. Because the LSN is more a measure of how many bytes have been written within a cluster, it's not something that meaningfully translates to other clusters, unfortunately.
Agreed, the snapshot does output a message in the logs but based on our conversations with AWS it was suggested that we use the SQL Command to determine the LSN. Sometimes depending on revision you won't get the logs and other times the log line is emitted twice based on the internal RDS consistency checks. Makes me long for GTIDs from MySQL MariaDB Galera[1]. They worked super well and we never looked back at my last company.

[1] https://mariadb.com/kb/en/using-mariadb-gtids-with-mariadb-g...

Can you expand on what those problems were you had with reconnecting? Would love to better understand this and see whether there's anything which can be improved in Debezium.