Hacker News new | ask | show | jobs
by bslatkin 5657 days ago
Lemme give you a naive example.

Say you wanted to generate a heatmap using MapReduces. How would you do it? You'd probably need something like this:

  1. Map location data points to (region -> weight)
  2. Reduce (region -> weight) to (region -> sum of weights)
  3. Map data points to (region -> 1)
  4. Reduce (region -> 1) to (region -> sum of points in region)
  5. Shuffle output of #2 and #4
  6. Reduce (region -> sum of points) and (region -> sum of weights) to (region -> average weight)
The pipeline API makes it easy to describe the dependencies between these separate MR jobs, wait for each segment to complete before triggering the next, and lets you reuse this logic as part of a larger computational workflow.

The Mapper framework/MapReduce integration part is not ready yet, but we're getting there. Release early/often~

ps. For those of you who know how to do a heatmap in a single MR: I'm just trying to demonstrate why you may need to pass inputs/outputs between MR jobs.