Hacker News new | ask | show | jobs
by adgjlsfhk1 1678 days ago
Octavian is absolutely early in it's development (currently I think it only supports matmul including all the transposed versions). https://raw.githubusercontent.com/JuliaLinearAlgebra/Octavia... is the benchmark. It uses automatic threading from both MKL and Octavian (although for these sizes, it will only use a few threads). With only one thread, MKL is much closer and is only behind by about 20% at n=25 and roughly equal by n=60. I haven't done timings with MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ, but I think that's unfair since Octavian has the same overhead of figuring out how many threads to use.
1 comments

Looking forward to see Octavian development then, it looks exciting! Dealing with triangular matrices and data dependencies in other linear algebra routines such as triangular solves and factorization will surely be an interesting benchmark for the approach, since such difficulties do not arise in matrix-matrix multiplication. Anyway, that's surely a good starting point for Octavian.

Just one clarification: MKL_DIRECT_CALL or MKL_DIRECT_CALL_SEQ is not about figuring out how many threads to use, it's about turning off checks on input arguments sizes, e.g. if m>lda, or negative lda or m or stuff like that. All these pedantic checks (which comply with the reference BLAS implementation in Netlib) are often times not done anyway in experimental linear algebra packages that do not aim at providing a compliant implementation of the standard Fortran BLAS.