Hacker News new | ask | show | jobs
by krallin 4409 days ago
Note that Spark 1.0.0 makes it possible to trivially submit spark jobs to an existing Hadoop cluster.

It leverages HDFS to distribute archives (e.g. your app JAR) and store results / state / logs, and YARN to schedule itself and acquire compute resources.

It's pretty amazing to see how you use Spark's API to write functional applications that are then distributed across multiple executors (e.g. when you use Spark's "filter" or a "map" operations, then the operation potentially gets distributed and distributed on totally different nodes).

Great tool — exciting to see it reach 1.0.0!

1 comments

Do you mean SIMR or something another ?