Hacker News new | ask | show | jobs
by shub 2876 days ago
If you're using Spark's built-in scheduler then the cluster manager is a SPOF. Hadoop docs say you can get active/standy ResourceManager, not that I've tried it. Spark can also use k8s and nomad to schedule executors, and those have HA modes as well. I assume Mesos does HA.

You're still boned if the driver dies. I am pretty sure that the driver keeps some important state in RAM so if the node hosting it goes down you have to restart from the beginning, even if the cluster manager restarts the driver.