Hacker News new | ask | show | jobs
by samnco 3352 days ago
Canonical will officially support GPUs when they lands GA upstream. The flag is beta as of now in the Canonical Distribution of Kubernetes. Paying customers either for the managed or supported solutions get a best effort for GPU, and this feature is enabled by default.
1 comments

What is the requirement for privileged containers? The post never explains it.
privileged containers are required for the GPU to be shared with the containers.

By default, the bundle come with a "auto" tag, which will activate privileged containers just when GPUs are detected.

You can enforce "false" to remove that, but then you won't be able to run GPU workloads.

Or you can enforce "yes" and have them activated all the time.

Does that answer the question? Not sure if I understood it right.

The Kubernetes docs don't say anything about having to use privileged containers for GPU support. Privileged containers are given tens of Linux capabilities; which of those are actually needed in your setup? Or, conversely, which specific step would fail for an unprivileged container?

Just because I want to use a GPU shouldn't require the power to change the clock, switch UIDs, chown files, mess with logs, reboot the machine, etc.

Since the GPU libraries are hosted on the node, privileged flag is typically required to make that possible. I'm sure there will be improvements to not require privileged, but today it's mostly a requirement to get anything useful out of containers tapping into GPU.

That said, if you set the allow-privileged flag to false GPU drivers will still be installed but you may not be able to make use of the cuda cores

That's weird, because all the times I tried the experimental support, it didn't need privileged containers. From the YAML files, it looks like it's using hostPath directories, but those don't require special privileges, unless you need to write to them:

https://kubernetes.io/docs/concepts/storage/volumes/#hostpat...

I suspect that there is a bug somewhere.

Ah, wait:

https://github.com/madeden/blogposts/blob/master/k8s-gpu-clo...

You don't need to mount the /dev entries into the container at all. The experimental support creates them automatically for you when you are using GPU resources. Perhaps it's device nodes, not the libraries that required privileges?