|
|
|
|
|
by bjourne
892 days ago
|
|
-O2 did improve performance significantly, but it's still 0.7 s for NumPy and 5.1 seconds for your code on 4096x4096 matrices. Either you're using a slow version of BLAS or you are benchmarking with matrices that are comparatively tiny (384x1536 is nothing). |
|
You're probably now comparing parallel code to single threaded code.