|
|
|
Show HN: Slurmq – GPU quota enforcement for Slurm
(dedalus-labs.github.io)
|
|
3 points
by windsor
178 days ago
|
|
When I was a student at Princeton, we had this neat Slurm quota management tool (https://github.com/klieret/pli-slurm-tool) that prevented people from hogging all the GPUs. It was really specific to Princeton’s clusters, though, so I decided to make a generalized version for everyone to use: slurmq Slurm's built-in fairshare only deprioritizes heavy users. Sometimes you need a hard cap. slurmq tracks GPU-hours per user over a rolling window and kills jobs when they go over quota set by the sysadmin. Some quick commands: $ pip install slurmq
$ slurmq check # check your quota
$ slurmq report # admin: see who's over limit
$ slurmq monitor --once --enforce # cron: warn, then cancel
Docs: https://dedalus-labs.github.io/slurmqSource: https://github.com/dedalus-labs/slurmq Hoping this helps other HPC sysadmins. We're using it internally and would love to hear how others handle GPU quota enforcement. |
|