Hacker News new | ask | show | jobs
by abeppu 1771 days ago
I question the premise that MapReduce really ever went away. Many migrated away from Hadoop, but in frameworks that succeeded it, MapReduce was still a core pattern. And in some cases, moving away from Hadoop wasn't ideal because later frameworks still got some things wrong. Maybe we stopped talking about MapReduce because we were focused on new patterns and challenges -- how to support many complex jobs and pipelines, more interactive and exploratory analysis, etc.

I'm curious about the difference between "continuous MapReduce" and I guess a subgraph in a "differential dataflow" (which I have read about but never really used). https://github.com/TimelyDataflow/differential-dataflow

1 comments

First let me say that I think Timely Dataflow and Materialize are both super cool. The two approaches are quite different, in part because they solve slightly different problems. Or maybe it's more fair to say that they think of the world in somewhat different ways. Probably most of the differences can be traced back to how Timely Dataflow relies on the expiration of timestamps in order to coordinate updates to its results. You can read the details on that in their docs (https://timelydataflow.github.io/timely-dataflow/chapter_5/c...).

I think a reasonable TLDR might be to say that continuous map reduce has a better fault-tolerance story, while timely dataflow is more efficient for things like reactive joins. They both have their purpose, though, and I imagine that both Flow and Materialize will go on to co-exists as successful products.