| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nchammas 4276 days ago

> Spark _requires_ Hadoop to run

This is not correct. Spark uses the Hadoop Input/Output API, but you don't need any Hadoop component installed to run Spark, not even HDFS.

You can -- and many companies do -- run Spark on Mesos or on Spark's standalone cluster manager, and use S3 as their storage layer.

> this whole Spark vs Hadoop debate makes no sense whatsoever

If we talk about Hadoop as an ecosystem of tools, then yes, it doesn't make sense to frame Spark as a competitor. Spark is part of that ecosystem.

But if we talk about Hadoop as Hadoop 1 MapReduce or as Hadoop 2 Tez, both of which are execution engines, then it very much makes sense to pit Spark against them as an alternative execution engine.

Granted, Hadoop 1 MapReduce is pretty old compared to Spark, and Tez is still under heavy development, but these are alternatives and not complements to Spark.

(Note: In Hadoop 2, MapReduce is just a framework that uses Tez as its underlying execution engine.)

> I just don't understand this constant boasting about Spark, it seems very suspicious to me.

Suspicious how?

I think Spark's elegant API, unified data processing model, and performance -- all of which are documented very well in demos and benchmarks online -- merit the excitement that you see in the "Big Data" community.