| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by liancheng 2531 days ago

Actually, none of these processes is essential for running Spark unless you have to read data from HDFS and have to run Spark on YARN.

Spark supports reading data from different sources, e.g., cloud blob storage, relational databases, NoSQL systems like C* and HBase. NameNode is only required if your data is stored on HDFS, and that is not an essential problem of Spark.

As for scheduling, Spark can run in standalone mode without any YARN components. Actually, that is how Spark clusters run in Databricks.