Hacker News new | ask | show | jobs
by celrod 2175 days ago
That's what I did when calling OpenBLAS and MKL, but I confess I don't know the internal details of a non-inlined `matmul` call in gfortran when you don't use `-fexternal-blas`.

Just writing three loops and letting the compiler optimize it was much faster for `A * B'`, so it must be a pretty naive implementation getting called.