| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by frankmcsherry 2236 days ago

Hi folks. Frank from Materialize here.

The main differences you should expect to see are generality and performance.

Generality, in that there are fewer limitations on what you can express. Oracle (and most RDBMSs) build their Incremental View Maintenance on their existing execution infrastructure, and are limited by the queries whose update rules they can fit in that infrastructure. We don't have that limitation, and are able to build dataflows that update arbitrary SQL92 at the moment. Outer joins with correlated subqueries in the join constraint; fine.

Performance, in that we have the ability to specialize computation for incremental maintenance in a way that RDBMSs are less well equipped to do. For example, if you want to maintain a MIN or MAX query, it seems Oracle will do this quickly only for insert-only workloads; on retractions it re-evaluates the whole group. Materialize maintains a per-group aggregation tree, the sort of which previously led to a 10,000x throughput increase for TPCH Query 15 [0]. Generally, we'll build and maintain a few more indexes for you (automatically) burning a bit more memory but ensuring low latencies.

As far as I know, Timescale's materialized views are for join-free aggregates. Systems like Druid were join-free and are starting to introduce limited forms. KSQLdb has the same look and feel, but a. is only eventually consistent and b. round-trips everything through Kafka. Again, all AFAIK and could certainly change moment by moment.

Obviously we aren't allowed to benchmark against Oracle, but you can evaluate our stuff and let everyone know. So that's one difference.

[0]: https://github.com/TimelyDataflow/differential-dataflow/tree...

2 comments

sradman 2236 days ago

Materialize sounds like it has reinvented Michael Stonebraker's StreamSQL [1] and SAP's Continuous Computation Language (CCL) [2] which was created as part of a StreamSQL competitor named Coral8 and lives on in an enterprise product now named SAP HANA Smart Data Streaming. This space has gone by many names, Streaming SQL, Complex Event Processing (CEP), and Event Streaming.

I think the Continuous Computation Language (CCL) name captures the essence of these systems: data flows through the computation/query.

These systems have always had promise but none have found anything but niche adoption. The two most popular use cases seem to be ETL-like dataflows and OLAP style Window queries incrementally updated with streaming data (e.g. computations over stock tick data joined with multiple data sources).

[1] https://en.wikipedia.org/wiki/StreamSQL

[2] https://help.sap.com/doc/PRODUCTION/e1b391d2a3f3439fbab27ed8...

link

frankmcsherry 2236 days ago

The projects you've mentioned are attempts to address stream processing needs with a SQL-like language. That is fundamentally different from providing incremental view maintenance of actual SQL using streaming techniques (what Materialize does).

If you want to maintain the results of a SQL query with a correlated subquery, StreamSQL in Aurora did not do that (full query decorrelation is relatively recent, afaiu). I have no idea what TIBCO's current implementation does.

If you want to maintain the results of a SQL query containing a WITH RECURSIVE fragment, you can do this in differential dataflow today (and in time, Materialize). I'm pretty sure you have no chance of doing this in StreamSQL or CCL or CQL or BEAM or ...

The important difference is that lots of people do actually want to maintain their SQL queries, and are not satisfied with "SQL inspired" languages that are insert-only (Aurora), or require windows on joins, or only cover the easy cases.

link

sradman 2235 days ago

> The projects you've mentioned are attempts to address stream processing needs with a SQL-like language. That is fundamentally different from providing incremental view maintenance of actual SQL using streaming techniques (what Materialize does).

With all due respect, CREATE SINK and CREATE SOURCE are SQL-like. I would argue that the pipeline created from the set of SINKs and SOURCEs is the key concept to grasp for developers new to your platform. The purity of the PostgreSQL CREATE MATERIALIZED VIEW syntax and other PG/SQL constructs seems like a minor selling point, in my (very narrowly informed) opinion. I hope I'm wrong.

Our difference of opinion involves marketing language and perceived differentiators. There are some important use cases for continuous SQL queries over Kafka-like data streams that remain unaddressed (as far as I know). I hope Materialize gains traction where others have failed to do so. If PG/SQL compatibility was the only thing holding back this style of solution then kudos to you and your team for recognizing it. Good luck (honestly).

link

cevian 2236 days ago

Just as a point of clarification: timescale materialized views don't allow joins during materialization. But you can join materialized views arbitrarily at query time. (TimescaleDB engineer here)

link