Hacker News new | ask | show | jobs
by necubi 667 days ago
I'm the creator of Arroyo (and have talked a lot with the Denormalized folks) so maybe can answer from my perspective (and Matt and Amey please correct me on any inaccuracies.)

First the similarities: both Arroyo and Denormalized use DataFusion and Arrow and are focused on high-scale, low-latency stateful stream processing.

Arroyo has been around a lot longer and is overall more mature. It's distributed (I believe Denormalized at this point is a single-node engine), supports consistent snapshotting of its state, event time and watermarks, and has a wide range of supported connectors (https://doc.arroyo.dev/connectors). It ships with a control plane, distributed schedulers, and web ui.

But the use cases we're targeting are different. Arroyo programmed via SQL, and is used primarily for real-time data pipelines; we aim to replace Flink SQL and kSQL.

Denormalized (as I understand it) is focused more on data science use cases where it makes sense to have an embedded engine, rather than a distributed one. It's programmed with a Rust dataframe API (and soon Python).