Hacker News new | ask | show | jobs
by flx42_ 3625 days ago
Author of nvidia-docker here. You can definitely have multiple containers on each GPU if you want. If you find a bug or if you think the documentation was not great, please file a bug!
2 comments

Awesome. Thanks for the reply and I apologize for suggesting something incorrect.

It does strike me as tricky needing to match driver versions between the host and the container. Do you know if there is any effort to eliminate that requirement?

Also while we're chatting, is there any hope of NVIDIA open sourcing their linux drivers? How would such a move affect nvidia-docker?

You don't need to match the driver version between the host and the container. Actually, you shouldn't include any driver file inside the container.

All the user-level driver-files required for execution are mounted when the container is started using a volume. This way you can deploy the same container on any machine with NVIDIA drivers installed.

We have more details on our wiki: https://github.com/NVIDIA/nvidia-docker/wiki/Internals

Concerning your last question: I don't have any information on this topic, but anyway it would not really impact nvidia-docker.

Thanks for your superb work. Is it possible to use nvidia-docker on several AWS instances, to use multiple GPUs? (To spread training on multiple GPUs for more speed and ram. Tensorflow and Caffe support distributed training but not sure if it's viable on dockerized envs on AWS?)
One container can use multiple GPUs on the same machine without problems.

For distributed training (which Caffe doesn't actually support, not the official version), you would have to run one container per instance, but this is more a configuration problem at the framework level, than a Docker or nvidia-docker problem.