Hacker News new | ask | show | jobs
by adgjlsfhk1 795 days ago
beating MKL for <100x100 is pretty doable. the BLAS framework has a decent amount of inherent overhead, so just exposing a better API (e.g. one that specifies the array types and sizes well) makes it pretty easy to improve things. For big sizes though, MKL is incredibly good.
1 comments

If you are talking about non-small matrix multiplication in MKL, is now in opensource as a part of oneDNN. It literally has exactly the same code, as in MKL (you can see this by inspecting constants or doing high-precision benchmarks).

For small matmul there is libxsmm. It may take tremendous efforts make something faster than oneDNN and libxsmm, as jit-based approach of https://github.com/oneapi-src/oneDNN/blob/main/src/gpu/jit/g... is too flexible: if someone finds a better sequence, oneDNN can reuse it without major change of design.

But MKL is not limited to matmul, I understand it...