https://docs.nvidia.com/cutlass/index.html
it presumably makes various assumptions and speedups for NVIDIA's matrix multiplication library... called cutlass