Hacker News new | ask | show | jobs
by lhuser123 338 days ago
And make it more complicated than K8s
1 comments

Not possible
The platforms I've seen live on top of kubernetes so I'm afraid it is possible. nvidia-docker, all the cuda libraries and drivers, nccl, vllm,... Large scale distributed training and inference are complicated beasties and the orchestration for them is too.