1. play around with the NVPTX LLVM backend and/or try compiling CUDA with Clang,
2. get familiar with the PTX ISA,
3. play around with ptxas + nvdisasm.