I wonder to what extent vulkan compute could be used for this. Of course, it is only an option on their RDNA GPUs since CDNA is not for graphics, even though that is the G in GPU.
There has been some testing within llama.cpp, which supports both Vulkan and ROCM-Blas. When it works, the latter is about 2x faster than the Vulkan version.