Hacker News new | ask | show | jobs
by IceWreck 1114 days ago
The solution is to not install CUDA on your base system because you need multiple versions of CUDA and some of them are often incompatible with your distro provided GCC.

Here is what works for me:

- Nvidia drivers on base linux system (rpmfusion/fedora in my case)

- Install nvidia container toolkit

- Use a cuda base container image and run all your code inside podman or docker

1 comments

I admit it's been a while (2 years) since I last played with Nvidia/CUDA (on Jetson) and back then running CUDA inside Docker was still somewhat arcane, but in my experience, whatever the Nvidia documentation lays out works well until you want to 1) cut down on container image size (important for caches and build pipelines) and, to this end, understand what individual deb packages and libraries do, 2) run the container on a system different from the official Nvidia Ubuntu image.

Back then the docs were just awful. Has this really changed that much in recent times?

Containers have always come in different flavors that represent their sizes and capabilities. For example, runtime containers have the bare minimum to get the application running but none of the debug tools.
The docs are still terrible, coupled with AWS / GCP docs around these things it makes it near impossible to get this stuff to work without investing a significant amount of time.