Hacker News new | ask | show | jobs
by dyoo1979 4372 days ago
You'll probably want to read the FlumeJava paper. http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/F...

Citation: http://dl.acm.org/citation.cfm?id=1806638

The key word is pipeline. If you have some analysis that runs in several stages, you'll be taking the output of one stage, and connecting it to the next. If you want to compose multiple phases, chained together, raw MapReduce isn't going to help you very much with the chaining.

What's described in the paper is a way to do the chaining in a nice way. The system will take care of writing the raw MapReduces for you. But it'll also do a lot of work on the interconnections between your stages as well.