|
|
|
|
|
by kortex
1974 days ago
|
|
SLURM hits a nice sweet spot when you have a very traditional cluster: very homogeneous nodes (both hardware and software), standard logins (eg some kind of LDAP/AD), shared NFS files, trusted code. It's an absolute pain when: - Lots of different kinds of nodes - anything more complex dependency wise than a handful of shared Conda envs - anything involving docker - anything vaguely untrusted - any kind of partitioning worse than 3 nines e.g. connectivity or uptime instability - anything more complex than 3-5 priority levels of scheduling It's great if you hit that niche but it frankly struggles with the complexities of even moderately heterogeneous work loads. It's also just a bit dated feeling. Even though kube is complex, it's a joy to work with compared to SLURM. Hashicorp is even better imho. |
|
>- Lots of different kinds of nodes
well, that's not a problem of slurm (which will happily start your process on all nodes), but of typical MPI programming. And once you are running something computationally intensive over multiple nodes today, you are still using MPI.
>- anything more complex dependency wise than a handful of shared Conda envs
you can put whatever dependencies you want on your NFS (or copy them to your node). If you're running on a single node it behaves 100% like running with a special login shell on os XYZ, so I don't know what problems happen with dependencies. The main problem would be that it doesn't include any "service discovery" beyond OpenMPI.
>- anything involving docker
have not used it, but there's enroot/singularity. The first of which is apparently dogfooded at Nvidia. Probably might need some adjustements for bases images (because MPI)... As I don't know about the policy within these 5k+ cloud companies: can employees just execute any random image from dockerhub there? This seems a little dangerous...
> anything vaguely untrusted
linked to the docker case? Does kubernetes reboot nodes then? Slurm can do this. And while classical Slurm use cases definitely require a shared account (because of the shared fs), slurm should afaik merrily execute your programs even without any shared account than slurm. You can attack this obviously, but so you can attack kubernetes and while it gets more scrutiny it's also a byzantine collection of FANG-style requirements.
EDIT: What you can't work around is Slurm needing a comms-channel back to the controller, which you though could just firewall off (jobs don't use Slurm to communicate...). As each job can execute a Prolog-script, you can even only selectively allow traffic to flow between allocated nodes quite simply.
>- any kind of partitioning worse than 3 nines e.g. connectivity or uptime instability
that's indeed the case
>- anything more complex than 3-5 priority levels of scheduling
what kind of scheduling does kubernetes implement? I guess you could write a plugin for slurm doing that
> It's great if you hit that niche but it frankly struggles with the complexities of even moderately heterogeneous work loads.
except that your points didn't pertain to this (except maybe for the dependencies, if you think about actual service-dependencies), I fully agree