|
|
|
|
|
by necubi
667 days ago
|
|
I'm the creator of Arroyo (and have talked a lot with the Denormalized folks) so maybe can answer from my perspective (and Matt and Amey please correct me on any inaccuracies.) First the similarities: both Arroyo and Denormalized use DataFusion and Arrow and are focused on high-scale, low-latency stateful stream processing. Arroyo has been around a lot longer and is overall more mature. It's distributed (I believe Denormalized at this point is a single-node engine), supports consistent snapshotting of its state, event time and watermarks, and has a wide range of supported connectors (https://doc.arroyo.dev/connectors). It ships with a control plane, distributed schedulers, and web ui. But the use cases we're targeting are different. Arroyo programmed via SQL, and is used primarily for real-time data pipelines; we aim to replace Flink SQL and kSQL. Denormalized (as I understand it) is focused more on data science use cases where it makes sense to have an embedded engine, rather than a distributed one. It's programmed with a Rust dataframe API (and soon Python). |
|