|
|
|
|
|
by adgjlsfhk1
795 days ago
|
|
beating MKL for <100x100 is pretty doable. the BLAS framework has a decent amount of inherent overhead, so just exposing a better API (e.g. one that specifies the array types and sizes well) makes it pretty easy to improve things. For big sizes though, MKL is incredibly good. |
|
For small matmul there is libxsmm. It may take tremendous efforts make something faster than oneDNN and libxsmm, as jit-based approach of https://github.com/oneapi-src/oneDNN/blob/main/src/gpu/jit/g... is too flexible: if someone finds a better sequence, oneDNN can reuse it without major change of design.
But MKL is not limited to matmul, I understand it...