|
|
|
|
|
by complex1314
1177 days ago
|
|
I'm in about the same situation as OP. We have a small cluster of Power9 and it's been unmaintained and unused for a while so I will set it up from scratch. Been looking into solutions that would be a good fit, for the moment we are just a few students/postdoc, so manual scheduling is feasible, but eventually we would like to make it available to other students at the institution. My candidates are also
- slurm + ray/lightning/etc.
- determined.ai (maybe together with slurm) Some advertise a kubernetes setup with kubeflow but I would imagine that is a bit too complex for a small cluster. Anyone else with experience in this? Any other suggestions? To make the environments as reproducible as possible it would be great to also have a setup based on docker containers and maybe nix, but not sure if it is feasible on ppc64. Guix and Spack have also come up in my searches. edit: typo |
|