|
|
|
Stream Processing with DuckDB/Polars?
|
|
4 points
by Binomial-Dist
547 days ago
|
|
I'm looking to do relatively simple streaming transformations on CDC data coming from Postgres. Dealing with a relatively small amount of data (1Ks to 10Ks of rows per minute, in most situations) making something like Flink + Kafka seem way overkill. Could engineer something custom that just builds on Postgres logical replication but would like something that just cleanly plugs in with DuckDB or Polars. Doing some research a little bit surprised there isn't too much out there in terms of trying to solve for these relatively simple single-node processing situations. There are some things (e.g. pg_replicate) but would like something that's more oriented around the Arrow data ecosystem. Curious if anyone has either managed to build anything custom here that worked well, or any tools I'm missing. |
|
Debezium + Arrow Flight: Use Debezium as a library to grab PostgreSQL CDC events and stream them into Arrow for super-fast, columnar processing. Works great with Polars or DuckDB.
RisingWave: This is a lightweight stream processor that connects directly to Postgres CDC, lets you write SQL for transformations, and keeps everything updated in real-time. No Kafka or heavy setups required.