Hacker News new | ask | show | jobs
by H8crilA 882 days ago
There really was always only Map and Shuffle (Reduce is just Shuffle+Map; also another name for Shuffle is GroupByKey). And you see those primitives under the hood of most parallel systems.
2 comments

Shuffle is interesting, I gotta read up on that. Maybe I've been hearing reduce for too long and have too much of a built-in visual sense of it but...shuffle does not seem like the right name at all, then I picture randomizing some set N, where the input and output counts are the same.
Shuffle is an operation that converts "{k1, v1}, {k1, v2}, {k2, v3}" into "{k1, [v1, v2]}, {k2, [v3]}".
Reduce is useful for aggregate metrics.
My point is that Reduce is Shuffle+Map, without materializing the intermediate result (the result after Shuffle).