|
|
|
|
|
by quanto
883 days ago
|
|
I would love to be corrected: do you have some specific examples or some underlying principles? My professional and academic background is in numerical analysis and scientific computing (including BLAS/LAPACK level implementations), but I admit I haven't done deep numerical implementations in years. |
|
At the other end of the spectrum, getting a matrix-matrix multiply to run fast isn’t easy either. It’s what necessitated the kind of approach the authors of BLIS adopted. On paper it’s easy, but actually getting it to run fast on a computer isn’t.