Hacker News new | ask | show | jobs
by marshallward 814 days ago
You are right, I just tested this out and my speed from BLAS to OpenBLAS went from 6 GFLOP/s to 150 GFLOP/s. I can only imagine what BLIS and MKL would give. I apologize for my ignorance. Apparently my faith in the compilers was wildly misplaced.
1 comments

No, you can still trust compilers: 1) The hand-tuned BLAS routines are essentially a different algorithm with hard-coded information. 2) The default OpenBLAS uses OpenMP parallelism, so much speed likely originates from multithreading. Set OMP_NUM_THREADS environment variable to 1 before running your benchmarks. You will still see a significant performance difference due to a few factors, such as extra hard-coded information in OpenBLAS implementation.
I ran with OMP_NUM_THREADS=1, but your point is well taken.

As for the original post, I felt a bit embarrassed about my original comments, but I think the compilers actually did fairly well based on what they were given, which I think is what you are saying in your first part.