Hacker News new | ask | show | jobs
by nja 1658 days ago
Debezium is a useful tool, but requires a lot of babysitting. If the DB connection blips or DNS changes (say, if you just rebuilt your prod db), or in some other cases, it'll die and present this exact problem. Fortunately, it's easy to enable a "heartbeat" topic to alert on to make sure it can be restarted before the db disk fills (of course, db size growth alerts are critical too).

We've found that it's worth it for most use cases to switch to a vanilla JDBC Kafka Connector with frequent polling. This also allows for cases such as emitting joined data.

Other than Debezium, Postgres + Kafka + Kafka Connect builds a pretty stable system for sending data around all our different dbs, apps, and data lakes.

2 comments

Which version of Debezium was it you encountered these issues with? Connection losses should not be a problem with any current version, as the connectors will restart automatically in that case.

Agreed though that monitoring should be in place, so to be notified upon failed connectors early on (could be based on the heartbeat topic, but there's also JMX metrics which can be exposed to Prometheus/Grafana, and/or health checks could be set-up based on the connector's status as exposed via the Kafka Connect REST API).

On the matter of disk growth, there's no silver bullet here. Some people will want to make 100% sure that never ever events are missed, which implies the replication slot must hold onto those WAL segments while it's not read (this is not specific to Debezium btw.). Others may be willing to accept missing events if the slot isn't read long enough, so those WAL segments can be discarded. In recent Postgres versions, a max size (or age, not sure) can be configured for a replication slot, so it's a matter of configuration which behavior you want.

In any case, a connector downtime for longer than say a few hours is something that should show up as an alert, allowing to take action.

Disclaimer: I work on Debezium

How do you deal with late commits that have smaller identifiers. Using the incrementing mode those records will be skipped.
This and not being able to track deleted records requires a solution like debezium.