Hacker News new | ask | show | jobs
by sbstp 718 days ago
Maybe -march=native gives it an edge as it compiles for this exact CPU model whereas numpy is compiled for a more generic (older) x86-64. -march=native would probably get v4 on a Ryzen CPU where numpy is probably targeting v1 or v2.

https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...

1 comments

Doesn’t numpy have runtime SIMD dispatching and whatnot based on CPU flags?

E.g. https://github.com/numpy/numpy/blob/main/numpy/_core/src/com...

np.matmul just uses whatever blas library your NumPy distribution was configured for/shipped with.

Could be MKL (i believe the conda version comes with it) but it could also be an ancient version of OpenBLAS you already had installed. So yeah, being faster than np.matmul probably just means your NumPy is not installed optimally.