Hacker News new | ask | show | jobs
by siliconpotato 212 days ago
> We even had an issue that SLURM uses 4 ports per job for the duration of the job, so you can't actually run more than a few thousand jobs simultaneously because the controller runs out of TCP ports!

That sounds concerning. Do you have a link to a bug report for this please? Is the tcp port problem on the compute node side or the controller side?

1 comments

The controller side. I don't think it is a bug; that's just how they designed it.

They want you to use array jobs for large jobs, or submit jobs in a fire-and-forget way.