|
|
|
|
|
by hedora
2949 days ago
|
|
I took a different lesson from learning to implement matrix multiply: you can get a small amount of code within a few percent of peak performance if you try. So, the trick is to keep the 99% of the code that does 10% of the work within an order of magnitude of peak performance, and then get the rest within a factor of two. That gets the system within a factor of three, which is still embarrassing, but that’s what profiling tools are for. :-) |
|