Hacker News new | ask | show | jobs
by danpalmer 879 days ago
> You can do MR within a DAG, so you could say that dataflows are a generalization or superset of the MR model.

I think it's the opposite of this. MapReduce is a very generic mechanism for splitting computation up so that it can be distributed. It would be possible to build Spark/Beam and all their higher level DAG components out of MapReduce operations.

2 comments

I don't mean generalization that way. Dataflow operators can be expressed as MR as the underlying primitive, as you say. But MR itself, as described in the original paper at least, only has the two stages, map and reduce; it's not a dataflow system. And it turns out people want dataflow systems, not hand-code MR and do the DAG manually.
I'm not sure what you describe is the opposite?

I mean, you can implement function calls (and other control flow operators like exceptions or loops) as GOTOs and conditional branches, and that's what your compiler does.

But that doesn't really mean it's useful to think of GOTOs being the generalisation.

Most of the time, it's just the opposite: you can think of a GOTO as a very specific kind of function call, a tail-call without any arguments. See eg https://www2.cs.sfu.ca/CourseCentral/383/havens/pubs/lambda-...