Hacker News new | ask | show | jobs
by jnaour 4233 days ago
Good introduction. Spark is really a project to watch in the data analysis field on distributed architecture. We had performed several benchmarks and Spark keeps its promisses. 2.5x faster comparing to Pig for the same algorithm on the same cluster.

For iterative algorithm with the in-memory possibilities, performances are really good comparing to Hadoop.

The project is still young with several bugs but the documentation is really good and the code is well commented and robust.

1 comments

As part of our work we have done extensive comparisons of Spark on various workloads, clusters and cluster sizes comparing with Hadoop Map Reduce, Naiad and several other frameworks. We've found Spark to be temperamental, hard to configure, and with wildly varying performance, suited only to a small set of computations for which in-memory state reuse is beneficial (mostly it isn't).

In nearly every test Naiad has beaten Spark.

More info on Naiad: http://research.microsoft.com/en-us/projects/naiad/

MSFT killed Naiad's predecessor Dryad in favor of Hadoop some time ago, because Hadoop was becoming popular. The primary author linked in the page works at "Microsoft Silicon Valley"which was just shut down and in fact now lists himself on LinkedIn as "Researcher At Large, previously at Microsoft"

So, how do we know Naiad has much future? . Technologically, it may be better/more reliable/faster, but if it's a niche product that gets desupported just because it never took off... it doesn't really matter.

Spark on the other hand has a great deal of momentum and in my experience, momentum and adoption trump technical elegance in the short run...

(don't get me wrong: I thought Dryad was awesome. Google's Flume is very similar in some ways. MapReduce's days are numbered except for a small number of problems which can't be easily ported).

All good points. I can't say what the future of Naiad is. What they have done to Microsoft Research Silicon Valley is disgusting (I worked there too for a short time).

In our experiences the performance claims with Spark have been more hype than substance. Naiad on the other hand has been hard to find a corner case for.

Naiad is open source licensed under an Apache License so one can only hope...

FYI: Link to Naiad github repo: https://github.com/MicrosoftResearch/Naiad
Thanks, but: this is a dead project.

Also, it appears to be tied to Windows (it's delivered as a VS solution).

Is this comparison work available?
With some luck, it will be published early in the new year.