Hacker News new | ask | show | jobs
by TheRealKing 812 days ago
No, you can still trust compilers: 1) The hand-tuned BLAS routines are essentially a different algorithm with hard-coded information. 2) The default OpenBLAS uses OpenMP parallelism, so much speed likely originates from multithreading. Set OMP_NUM_THREADS environment variable to 1 before running your benchmarks. You will still see a significant performance difference due to a few factors, such as extra hard-coded information in OpenBLAS implementation.
1 comments

I ran with OMP_NUM_THREADS=1, but your point is well taken.

As for the original post, I felt a bit embarrassed about my original comments, but I think the compilers actually did fairly well based on what they were given, which I think is what you are saying in your first part.