| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by necubi 863 days ago

> SQL streaming engines really seem to be having a moment.

I definitely agree! In the past few years, a bunch of folks (including myself) who had been working with Flink/Spark Streaming/KSQL/etc. at large companies decided that the time was right for a new generation of streaming systems and started companies to do that. For myself, seeing how much users struggled to build pipelines on Flink at Lyft inspired me to build Arroyo.

I think it's really exciting after ~5 years of relative stagnation.

> As someone who is less familiar with all the players in the space, how should I think about Arroyo vs. streaming databases like Materialize or caching tools like Readyset?

There are no hard lines (and internally all of these systems look fairly similar) but the products and use cases are pretty different.

To give my gloss:

* Readyset is a very clever cache for your OLTP database that lets you push it into more analytical territory with reasonable performance, but still focused mostly on product use cases; the stream processing system is internal and not exposed to users

* Materialize is designed to provide OLAP materialized views on top of your OLTP database by reading postgres/mysql changefeeds. It gives you up-to-date results for analytical queries without needing to replicate your postgres to snowflake and repeatedly query it.

* Arroyo is a modern Flink, designed for more traditional stream processing use cases. This includes real-time analytics, but is more focused on operational and product use cases like alerting, real-time ML, automated remediation, and streaming ETL.

Also, Arroyo is the only one of these that is fully open source (apache 2) and designed for self hosting.