Hacker News new | ask | show | jobs
by throwaway4good 713 days ago
What is the point of making the matrix multiplication itself multithreaded (other than benchmarking)? Wouldn't it be more beneficial in practice to have the multithreadedness in the algorithm that use the multiplication?
1 comments

That's indeed what's typically done in HPC. However, substituting a parallel BLAS can help the right sort of R code simply, for instance, but HPC codes typically aren't bottlenacked on GEMM.