|
|
|
|
|
by marshallward
809 days ago
|
|
Fair enough, this is not meant to be some endorsement of the standard Fortran BLAS implementations over the optimized versions cited above. Only that the mainstream compilers cited above appear capable of applying these optimizations to the standard BLAS Fortran without any additional effort. I am basing these comments on quick inspection of the assembly output. Timings would be equally interesting to compare at each stage, but I'm only willing to go so far for a Hacker News comment. So all I will say is perhaps let's keep an open mind about the capability of simple Fortran code. |
|
[0] https://www.cs.utexas.edu/users/rvdg/tmp/TSoPMC.pdf