|
|
|
|
|
by gnufx
1538 days ago
|
|
I'd guess that previous machine-specific BLAS used assembly. The fundamental feature of the GotoBLAS approach is the blocking structure; that can provide >80% of the performance of hand-tuned assembly/intrinsics-based kernels with plain C in BLIS with vectorizing compilers. Not to minimize Goto's work, but I wonder if van de Geijn, who I think was Goto's supervisor, deserves more credit than he gets. |
|