Hacker News new | ask | show | jobs
by chrisjc 2306 days ago
Would it be fair to say this is a more OLAP-oriented approach to what KSqlDB (not KSql, but https://ksqldb.io/) does?

Seems that it's perhaps lacking the richness of how ksqldb uses Kafka Connectors (sinks and sources), but I don't see any reason you couldn't use Materialize in conjunction with ksqldb.

Eg:

KC-source --> ksql --> materialize --> kafka --> KC-sink

Question to Materialize...

What connectors (sinks and sources) do you have or plan to develop? Seems like it's mostly Kafka in and out at the moment.

Why would I use this over KSqlDB?

Can I snapshot and resume from the stream? Or do I need to rehydrate to re-establish state?

2 comments

> Would it be fair to say this is a more OLAP-oriented approach to what KSqlDB (not KSql, but https://ksqldb.io/) does?

I'm not sure I'd say it's "more OLAP." ksqlDB is about as OLAP as it gets, considering it doesn't support any sort of transactions or consistency. We think Materialize is quite a bit more powerful than what ksqlDB offers, thanks to the underlying technologies (timely/differential). For example, our joins are proper SQL joins, and don't require you to reason about the complicated and confusing difference between a stream and a table (https://docs.ksqldb.io/en/latest/developer-guide/joins/join-...). We also have preliminary support for maintaining the consistency properties of upstream OLTP data sources, and we'll be rolling out a more complete story here shortly.

> Seems that it's perhaps lacking the richness of how ksqldb uses Kafka Connectors (sinks and sources), but I don't see any reason you couldn't use Materialize in conjunction with ksqldb.

Is there something in particular about ksqlDB connectors that we don't seem to support? Our CREATE SOURCE command is quite powerful: https://materialize.io/docs/sql/create-source/.

> What connectors (sinks and sources) do you have or plan to develop? Seems like it's mostly Kafka in and out at the moment.

We already support file sources in a variety of formats, and support for Amazon Kinesis is on the short-term roadmap: https://github.com/MaterializeInc/materialize/issues/1239.

Are there other connector types you'd like to see?

> Can I snapshot and resume from the stream? Or do I need to rehydrate to re-establish state?

At the moment you can't snapshot and resume, but support for this is planned.

Thanks for the detailed response. At this point i think the onus is on me to go and take a deeper look into timely/differential.

Important "sources" to me are obviously Kafka, but also MySQL and Mongo. Important "sinks" would be Snowflake (maybe through S3, or directly though PUT) and ElasticSearch. Although I imagine you might soon be telling me that you don't need a data warehouse once you have Materialize :)

Presumably one reason to use this is latency: materialize is built on differential and timely dataflow, some innovative frameworks (by Frank McSherry, who was at Microsoft Silicon Valley Research back in the day and is elsewhere on this HN discussion) that are intended to reduce the amount of computation in certain kinds of calculations. Materialized views are particularly ripe for those advances.

It's also written in rust instead of Java, so there's no JVM RAM penalty or GC to contend with.