The approach is a bit different depending on your host operating system. You'll also find there are constraints when you introduce a virtualisation layer, like virtualbox or parallels on your desktop - GPUs can be mapped through, but it's painful(ish).
The first stage of the process is to take a vanilla CoreOS host and inject the CUDA drivers (one time process). After that, you can reboot the box and still retain the devices, for mapping into docker containers.