Hacker News new | ask | show | jobs
by bbitmaster 1538 days ago
Ah, I personally preferred gotoblas to linpack as it was faster (keep in mind this was a few years ago when I did research, no idea how it kept up). I also enjoyed the story of how an unknown Japanese student ( Mr Goto - cool name ) outdid all the top linear algebra implementations by hand writing everything in assembly. It was an amazing story. This was when everyone said assembly was dead.
2 comments

I'd guess that previous machine-specific BLAS used assembly. The fundamental feature of the GotoBLAS approach is the blocking structure; that can provide >80% of the performance of hand-tuned assembly/intrinsics-based kernels with plain C in BLIS with vectorizing compilers. Not to minimize Goto's work, but I wonder if van de Geijn, who I think was Goto's supervisor, deserves more credit than he gets.
even at Dongarra's lab, gotoblas saw heavy use. LINPACK didn't need to be the fastest - just the original/reference implementation.