Hacker News new | ask | show | jobs
by olavgg 1100 days ago
If you want to listen to database changes, check out Debezium. Instead of triggers it takes advantage of the more recent CDC functionality that most SQL Servers has implemented today. The difference is that triggers works on every transaction while CDC works on a redo log file. This makes it possible to transfer database changes with minimal performance impact.
10 comments

If you’d like to stay in the Elixir world, I’ve created WalEx:

https://github.com/cpursley/walex

I hadn't seen WalEx. Looks rad!!
It’s mostly just ripped off code from Supabase and refactored to use Postgrex Replication based on the video you did.
<3
One of my clients uses a tool operating within the same space as Debezium called Qlik Replicate to listen for log-based CDC to replicate the data to a different database. The challenge we found was that not everything was recorded into the redo log, like DELETE's wrapped inside functions in one vendor's database. There was a long list of caveats we eventually uncovered beyond even the documented ones. This challenge arises even if performing CDC between two databases from the same vendor on the same version, somewhat narrowing the use cases we could apply CDC upon.

If you must care about bit-level accurate replication, catching gaps after the replication completes is we concluded after a year of chasing that chimera a deep rabbit hole with an event horizon that constantly recedes into the future. If you can tolerate some replication errors and can tolerate not knowing where they happen without a lot of investigation, then CDC works great.

CDC gets us close, but I'm still looking for someone who is working upon covering the edge cases that redo logs alone do not address.

If you want to listen to database changes in Elixir you can also get really good stuff done by using Cainophile (https://github.com/cainophile/cainophile). Same mechanism. I don't know the details of Debezium so I can't say if you are leaving fantastic things on the table. But I've had good fun with Cainophile. For example I've used it in my videos on Electric SQL to react to changes in a Postgres database. It matches nicely with realtime-ish UI via LiveView. So meshes really good with the current Elixir stack.
That library is great but pretty dated. I put together WalEx which borrowed a good bit from Cainophole but uses the new WAL listening logic in Postgrex:

https://github.com/cpursley/walex

What is dated? WalEx is still using logical replication slots, right? Trying to bring myself up to speed. Currently am using cainophile and would like to understand what WalEx is offering above and beyond. Cainophile isn't actively maintained but I think that's because it is feature complete, at least according to the author's concerns.
That is interesting. So this would mean using what I already have when using Ecto instead of the other postgres client I think cainophile uses.sounds good to me.
Yeah, originally I used cainophile's logic then switched to this:

https://hexdocs.pm/postgrex/Postgrex.ReplicationConnection.h...

Here's a great talk on Postgrex Relication:

https://www.youtube.com/watch?v=QubB19Dnxh0

> CDC works on a redo log file

Yeah I was gonna say, there are probably ways to use the PostgreSQL Write-Ahead Log (WAL) to stay up to date with every change, without having triggers. This CDC you mention sounds similar to that.

Yes. Debezium is the charm for this. If your systems are normal size, as in not Amazon scale, the embedded engine is a great addition to a codebase. It doesn't have the fault-tolerance of the full deployment, but it's suitable for many, many purposes. An issue with triggers is that they don't really live in your codebase. It's too easy to forget they even exist, and maintenance and support will eventually become impossible. The benefit of Debezium is that everything lives in your codebase and reasoning take less cognitive overhead.
Being the scale of Amazon has nothing to do with whether or not fault-tolerance is important to an application. In fact, amazon has far fewer real needs for fault-tolerance than many other businesses (its a retail site!).

You should use embedded mode if you do not require fault-tolerance and can miss updates. Otherwise, don't. Regardless of scale.

Debezium made my zero downtime db migration project possible.
Debezium is an amazing tool for bootstrapping event streams +1 would recommend

CDC is such a great concept and is so little used unfortunately

Thanks for this! I knew about CDC solutions from vendors like Informatica, but wasnt aware of Debezium.
Do you have more information on CDC?
Most popular databases have change data capture as a feature. They read backups of the transaction log and extract events they care about (dml operations against subscribed tables) and then store the results into shadow tables. They tend to be relatively low overhead compared with many hand-rolled solutions.

One other benefit is they capture all the changes to the underlying data, not just the net changes.

It’s important to realize though that CDC records change information but isn’t a mechanism to move it anywhere. You would still have to devise a means to move the data to another system.

Debezium is a data movement tool that uses CDC for the underlying tracking.