|
|
|
|
|
by mkeeter
590 days ago
|
|
For a very deep dive into the subject, this is a great writeup: How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance
(https://siboehm.com/articles/22/CUDA-MMM) (It's CUDA-specific, so there may be aspects that can't yet be ported to WGPU) |
|
there are a few things that i wasn't able to figure out how to get access to/i wasn't sure if they were possible. for example, a lot of Simon's article takes advantage of the warp scheduler and warp tiling.
i had a hard time finding information on if that's even possible with my M2/metal and the general memory access patterns. it seems like CUDA does have better documentation in this regard