Hacker News new | ask | show | jobs
by fangpenlin 479 days ago
There's a bug in k8s-device-plugin that stops the plugin from even launching, as I mentioned in the article:

https://github.com/NVIDIA/k8s-device-plugin/issues/1182

And I opened a PR for fixing that here:

https://github.com/NVIDIA/k8s-device-plugin/pull/1183

I am unsure if this bug is only for the NixOS environment because its library paths and other quicks differ from those of major Linux distros.

Another major problem was that the "default_runtime_name" in the Containerd config didn't work as expected. I had to create a RuntimeClass and assign it to the pod to make it pick up the Nvidia runtime.

Other than that, I haven't tried K3S, the one I am running is a full-blown K8S cluster. I guess they should be similar.

While there's no guarantee, if you find any hints showing why your Nvidia plugin won't work here, I might be able to help, as I skip some minor issues I encountered in the articles. If it happens to be the ones I faced, I can share how I solved them.

1 comments

By the way, one of the problems I encountered but didn't mention in the article was that the libnvidia-container has problem with the pathes for reading nvidia drivers and libraries under NixOS with its non-POSIX pathes. I had to create a patch for modifying the path files. I just created a Gist here with the patch content:

https://gist.github.com/fangpenlin/1cc6e80b4a03f07b79412366b...

But later on, since I am taking the CDI route, it appears that the libnvidia-container (nvidia-container-cli) is not really used. If you are going with just container runtime approach instead of CDI, you may need a patch like this for the libnvidia-container package.

Oooo, thanks for the pointers! Will be revisiting this tomorrow!