|
|
|
|
|
by krallin
4409 days ago
|
|
Note that Spark 1.0.0 makes it possible to trivially submit spark jobs to an existing Hadoop cluster. It leverages HDFS to distribute archives (e.g. your app JAR) and store results / state / logs, and YARN to schedule itself and acquire compute resources. It's pretty amazing to see how you use Spark's API to write functional applications that are then distributed across multiple executors (e.g. when you use Spark's "filter" or a "map" operations, then the operation potentially gets distributed and distributed on totally different nodes). Great tool — exciting to see it reach 1.0.0! |
|