Hacker News new | ask | show | jobs
by matharmin 691 days ago
Yeah, I've dealt with some of those edge cases on AWS and GCP.

Some examples:

1. I've seen a delay of hours without any messages being sent on the replication protocol, likely due to a large transaction in the WAL not committed when running out of disk space.

2. `PgError.58P02: could not create file \"pg_replslot/<slot_name>/state.tmp\": File exists`

3. `replication slot "..." is active for PID ...`, with some system process holding on to the replication slot.

4. `can no longer get changes from replication slot "<slot_name>". ... This slot has never previously reserved WAL, or it has been invalidated`.

All of these require manual intervention on the server to resolve.

And that's not even taking into account HA/failover, these are just issues with logical replication on a single node. It's still a great feature though, and often worth having to deal with these issues now and then.

1 comments

Definitely agreed. It is great feature and a building block for many complex features such as multiple primaries, zero down time migrations etc. I'm also quite happy to see with each PG version, it becomes more stable/easy to use.