Hacker News new | ask | show | jobs
by hedora 2949 days ago
I took a different lesson from learning to implement matrix multiply: you can get a small amount of code within a few percent of peak performance if you try. So, the trick is to keep the 99% of the code that does 10% of the work within an order of magnitude of peak performance, and then get the rest within a factor of two.

That gets the system within a factor of three, which is still embarrassing, but that’s what profiling tools are for. :-)