|
|
|
|
|
by microtonal
2122 days ago
|
|
The plot thickens. As I reported elsewhere in the thread, the slow code paths were selected on my machine, unless I override the mkl_serv_intel_cpu_true function to always return true. However, this was with PyTorch. I have now also compiled the ACE DGEMM benchmark and linked against MKL iomp: $ ./mt-dgemm 1000 | grep GFLOP
GFLOP/s rate: 69.124168 GF/s
Most-used function is mt-dgemm libmkl_def.so [.] mkl_blas_def_dgemm_kernel_zen
So, it is clearly using a GEMM kernel. Now I wonder what is different between PyTorch and this simple benchmark, causing PyTorch to result in a slow SSE code path. |
|
Conclusion: MKL detects Zen now, but currently only implements a Zen code path for dgemm and not for sgemm. To get good performance for sgemm, you have to fake being an Intel CPU.
Edit, longer description: https://github.com/pytorch/builder/issues/504