|
|
|
|
|
by somethingsome
29 days ago
|
|
Not really, Hardware didn't really change that much, of course you'll not find Tensor or raytracing cores, but you will have a very solid grasp of gpu programming and the cuda language (that didn't change that much either), and then you can easily learn those more modern things with blog posts or even, at worst, chatgpt. |
|
I would separate the knowledge into maybe 3 distinct buckets.
The baseline: device/host boundary, SIMT programming etc.
The intermediate: kernel architecture, CUDA graph vs persistent kernels, warp specialisation/divergence avoidance techniques etc.
The advanced: architecture specifics so tcgen05, TMA, SMEM/HBM, memory throughput vs compute biases in various arch impls., GEMM, FHMA, all the tricks that make modern fused kernels very fast. Also would bucket most GPU Direct RDMA/GPU NetIO/friends here too.
The baseline hasn't changed much and probably won't, the intermediate knowledge has also remained pretty reliably stable for ~10 years with only things like graphs changing stuff. Tile might become more relevant than it is today but for now CUDA, cuBLAS, friends are where it's worth investing knowledge.