Hacker News new | ask | show | jobs
by exxo_ 3683 days ago
I think you are missing the point, it has nothing to do with installing the NVIDIA drivers through Docker.

What you are showing[1] is how to install NVIDIA drivers on CoreOS the hackish way (not persistent, no driver libs, no DKMS, no UVM, no KMS...)

Regarding rkt, it's not supported at the moment but a similar approach could be taken. As for the Docker CLI wrapper, you can avoid it if you really need to.

1 comments

While my code here is definitely hackish, I can't argue with that, I have to say I'm hard pressed to see how running a container to activate a driver is hackish when the comparison at hand is modifying the Docker CLI and requiring a Docker plugin.

I run the driver container at startup, and never shut it down. How is this not persistent? DKMS and other build/deployment choices are not obviated by my approach, so I'm not sure that's relevant.

Looking more deeply at the "Why NVIDIA Docker" in the repo wiki doesn't provide any enlightenment either. In fact it doesn't really explain why docker itself must be modified. The only explanation really is lack of container portability, but driver containers are portable within the scope of a given kernel version. Certainly modified docker cli and plugin requirements are much less portable.

It seems to me like someone at nVidia simply didn't realize that they could run a container in privileged mode and effectively install the driver system wide for all containers.

If you want more insights, I suggest you read the section "Internals".

I'm not going to dwell on the details but there are many reason why doing so can go horribly wrong. Believe me, we (NVIDIA) evaluated our options and know the implications of running our drivers within containers.

Do you really know what --privileged do? If so, you know that there is no such thing as "install the driver system wide". For that you would have to circumvent the namespaces and a bunch of other things that Docker put in place.

"portable within the scope of a given kernel" [and driver] "version"

Well that's not what I call portable :) With nvidia-docker you can build a CUDA image on your laptop and deploy it anywhere in the cloud or on premises without a single modification.

Ok great, thanks! I'll check this out when we run into these issues.