| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by basyt 4417 days ago
	Its fundamentally the same thing as MapReduce isn't it? Can someone explain the differences to me please? There isn't much of use in the article

3 comments

dyoo1979 4417 days ago

You'll probably want to read the FlumeJava paper. http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/F...

Citation: http://dl.acm.org/citation.cfm?id=1806638

The key word is pipeline. If you have some analysis that runs in several stages, you'll be taking the output of one stage, and connecting it to the next. If you want to compose multiple phases, chained together, raw MapReduce isn't going to help you very much with the chaining.

What's described in the paper is a way to do the chaining in a nice way. The system will take care of writing the raw MapReduces for you. But it'll also do a lot of work on the interconnections between your stages as well.

link

espeed 4417 days ago

MapReduce wasn't designed for iterative algorithms or streaming data, whereas Google Dataflow and Spark (http://spark.apache.org/) make iterative algoritms easy. It's a much simpler programming paradigm, and it allows you to do iterative graph-processing and machine-learning algos (http://spark.apache.org/mllib/) that are impractical on MapReduce.

For example, Spark provides the primitives needed to build GraphX (http://amplab.github.io/graphx/, http://spark.apache.org/graphx/), which is essentially GraphLab on Spark.

link

njharman 4417 days ago

This has "cloud" prefixed to name of every component. So, obviously, is better. Also, they're selling it. So, ya know, marketing trumps engineering.

link