Hacker News new | ask | show | jobs
by gnufx 1538 days ago
I'd guess that previous machine-specific BLAS used assembly. The fundamental feature of the GotoBLAS approach is the blocking structure; that can provide >80% of the performance of hand-tuned assembly/intrinsics-based kernels with plain C in BLIS with vectorizing compilers. Not to minimize Goto's work, but I wonder if van de Geijn, who I think was Goto's supervisor, deserves more credit than he gets.