| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by frankmcsherry 2206 days ago

By moving up the stack a bit (managing computation, rather than storage) we can provide consistency using techniques other than just using the guarantees provided by the storage itself. This isn't a new observation (Dryad/DryadLINQ made it wrt MapReduce/Hadoop, among other examples I'm sure) but that is where the trade-off lies.

If you instead implement low-latency systems where each step along a dataflow involves a round-trip through replicated highly available storage, Spanner if you like or even just Kafka, then 100% you might reasonably conclude that eventual consistency is the right call. This is roughly the situation that microservice implementors currently find themselves in. I don't think it is a great situation to be in, personally.

The value proposition with something like Materialize (and there are other options) is that you can get consistency and performance if you can express your computation as something more structured than imperative code that writes to and reads from storage. In our case, the "something" is SQL.

Hope that helps!

1 comments

virgilp 2206 days ago

Hey - great work with materialize.io, I've always wanted to play more with it but life always got in the way so far :(

One question I have for you is whether it would be appropriate for processing where you need to iterate (think e.g. connected components in a graph, where you repeatedly broadcast the component ID to the neighboring nodes: can this be somehow done with materialize's version of SQL? You can of course do looping with timely - but, how do you do that with SQL?

link

frankmcsherry 2206 days ago

In SQL you would most likely be directed to use `WITH RECURSIVE`, which is something we plan to do, but not yet.

It can be a bit gross to use WITH RECURSIVE, because there are often some constraints on the types of queries you can express (e.g. that the recursive body must conclude with a UNION/UNION ALL with some base case). Differential dataflow doesn't have that requirement, but we'll have to sort out whether we'll remove that requirement for Materialize, or impose the traditional constraints. There is a Chesterton's fence moment to have first.

Whether it ends up being "appropriate" or not will be a great thing to determine. I anticipate eating a lot of crow when it turns out to be lots slower than bespoke graph processors. :)

edit: Thanks, btw!

link