|
|
|
|
|
by adgjlsfhk1
1678 days ago
|
|
Octavian is absolutely early in it's development (currently I think it only supports matmul including all the transposed versions). https://raw.githubusercontent.com/JuliaLinearAlgebra/Octavia... is the benchmark. It uses automatic threading from both MKL and Octavian (although for these sizes, it will only use a few threads). With only one thread, MKL is much closer and is only behind by about 20% at n=25 and roughly equal by n=60. I haven't done timings with MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ, but I think that's unfair since Octavian has the same overhead of figuring out how many threads to use. |
|
Just one clarification: MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ is not about figuring out how many threads to use, it's about turning off checks on input arguments sizes, e.g. if m>lda, or negative lda or m or stuff like that. All these pedantic checks (which comply with the reference BLAS implementation in Netlib) are often times not done anyway in experimental linear algebra packages that do not aim at providing a compliant implementation of the standard Fortran BLAS.