Hacker News new | ask | show | jobs
by basyt 4370 days ago
Its fundamentally the same thing as MapReduce isn't it? Can someone explain the differences to me please? There isn't much of use in the article
3 comments

You'll probably want to read the FlumeJava paper. http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/F...

Citation: http://dl.acm.org/citation.cfm?id=1806638

The key word is pipeline. If you have some analysis that runs in several stages, you'll be taking the output of one stage, and connecting it to the next. If you want to compose multiple phases, chained together, raw MapReduce isn't going to help you very much with the chaining.

What's described in the paper is a way to do the chaining in a nice way. The system will take care of writing the raw MapReduces for you. But it'll also do a lot of work on the interconnections between your stages as well.

MapReduce wasn't designed for iterative algorithms or streaming data, whereas Google Dataflow and Spark (http://spark.apache.org/) make iterative algoritms easy. It's a much simpler programming paradigm, and it allows you to do iterative graph-processing and machine-learning algos (http://spark.apache.org/mllib/) that are impractical on MapReduce.

For example, Spark provides the primitives needed to build GraphX (http://amplab.github.io/graphx/, http://spark.apache.org/graphx/), which is essentially GraphLab on Spark.

This has "cloud" prefixed to name of every component. So, obviously, is better. Also, they're selling it. So, ya know, marketing trumps engineering.