|
|
|
|
|
by dsharlet
895 days ago
|
|
BLAS is getting almost exactly 100% of the theoretical peak performance of my machine (CPU frequncy * 2 fmadd/cycle * 8 lanes * 2 ops/lane), it's not slow. I mean, just look at the profiler output... You're probably now comparing parallel code to single threaded code. |
|