For the P3 (Volta V100) instances you'll want to ensure you use an AMI preloaded with CUDA 9, though not all DL frameworks are happy with that yet.
https://aws.amazon.com/amazon-ai/amis/
CUDA 8 programs will run, but terribly slowly as they JIT their GPU code without optimization for Volta. You want the CUDA 9 AMI version (https://aws.amazon.com/marketplace/pp/B076TGJHY1?qid=1509090...), but it currently only has MXNet and TF.
If you need other frameworks there's the NVIDIA AMI (https://aws.amazon.com/marketplace/pp/B076K31M1S?qid=1509090...) and Volta optimized containers for NVCaffe, Caffe2, CNTK, Digits, MXNet, PyTorch, TensorFlow, Theano, Torch, CUDA 9/CuDNN7/NCCL.
For the P3 (Volta V100) instances you'll want to ensure you use an AMI preloaded with CUDA 9, though not all DL frameworks are happy with that yet.
https://aws.amazon.com/amazon-ai/amis/