Hacker News new | ask | show | jobs
by jgraettinger1 1060 days ago
> Postgres has a different issue in that replication slots can grow really fast, esp on AWS! [2]. We ended up writing our own custom snapshotter for Postgres that is Debezium compatible to onboard customers that have a massive dataset and cannot afford to have a read lock on their WAL.

Does Debezium's DDD-3 watermark (DBLog) implementation for Postgres not process the WAL quickly enough? We don't use it ourselves either, but architecturally it appears it would reasonably bound how long the WAL can remain un-read?

Agreed that many production DBs people care about have pretty severe limitations here! Managed Supabase is another good example.

1 comments

On a single unbounded (CPU + mem) Debezium running on a VM extracting Postgres, I was able to clock in about 7-10m/hr. You could increase the # of tasks, but then it'll hinder your DB perf. Also, this is on your primary DB.

We found it far more efficient and less risky to do CDC streaming and snapshotting w/o read lock in parallel to two different topics. Once snapshot is done and drained, we then move to drain the CDC topic.