I found the introductory chapters (1-3) of this book[0] quite good. It is different from the NVIDIA CUDA C++ guide in that it uses modern C++ and has non-trivial real-world examples.
I also wrote a blog post [1] exploring CUDA to write a simple CNN inference module which you might find useful.