If you like textbooks, I would recommend "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj. [1] The most recent edition was published in 2022.
If you like lecture videos, I would recommend Hajj's YouTube playlist of 2021 lectures [2]. He works through a subset of the textbook.
This will give you a good foundation of GPU hardware architecture and CUDA programming. The knowledge is somewhat transferable to other areas of high-performance computing.
Thank you, that lecture is really great. Basically the second lecture shows you exactly how it's done, at least the basics with a minimal example, very well explained. Im looking forward to watch the entire playlist.