|
|
|
|
|
by srean
5274 days ago
|
|
I dont think I claimed that a vendor "cannot do things differently". Not sure where the downvotes came from, so just clarifying. I would also hazard a guess that the tuning that you mention does not involve a change in the algorithm but are essentially reordering the steps of the algorithm to obtain better caching. I was responding to the parent post which conjectured that BLAS implementations use different algorithms. If you look at your two suggestions, none of them actually change the complexity class of the number of floating operations, but wall clock time oh absolutely. Although there are matrix multiplication algorithms that have a complexity less than O(N^3) the constants for these are so large enough that the sizes of matrices for which there will be any appreciable benefit are extremely rare to come by. |
|