I took few minutes to try to make it work on ROCm (AMD's alternative to CUDA), landed in python dependency hell.
What do you mean? They are the ones introducing the matmul extensions to Vulkan, which makes compute like this possible
I took few minutes to try to make it work on ROCm (AMD's alternative to CUDA), landed in python dependency hell.