|
|
|
|
|
by stillyslalom
2123 days ago
|
|
Comparing OpenBLAS and MKL with `peakflops` in Julia, there's definitely an advantage for MKL: julia> using LinearAlgebra
julia> BLAS.vendor()
:openblas64
julia> BLAS.set_num_threads(1)
julia> peakflops()
3.9023447970402664e10
julia> using LinearAlgebra
julia> BLAS.vendor()
:mkl
julia> BLAS.set_num_threads(1)
julia> peakflops()
4.8113846984735275e10
That's close to the ~50 Gflops I saw in @celrod's benchmarks. |
|
I have now also compiled the ACE DGEMM benchmark and linked against MKL iomp:
Most-used function is So, it is clearly using a GEMM kernel. Now I wonder what is different between PyTorch and this simple benchmark, causing PyTorch to result in a slow SSE code path.