|
|
|
|
|
by enum
237 days ago
|
|
+1 I have H100s to myself, and access to more GPUs than I know what to do with in national clusters. The Spark is much more fun. And I’m more productive. With two of them, you can debug shallow NCCL/MPI problems before hitting a real cluster. I sincerely love Slurm, but nothing like a personal computer. |
|
As for debugging, that's where you should be allowed to spin up a small testing cluster on-demand. Why can't you do that with your slurm access?