Hacker News new | ask | show | jobs
by cdavid 5274 days ago
NumPy dev here.

Note that numpy may not use BLAS (this was done though as to avoid any hard dependencies on 3rd party libraries). I am not sure what you mean by putting the FLOP count below what's required. BLAS will still need O(N^3) operations for a NxN matrix multiplications, whether they are optimized or not. The biggest difference between libraries is usually in clever data organization/passing to use the cpu cache as efficiently as possible (memory throughput is usually the bottleneck until your data don't fit in RAM). You can easily gain one order of magnitude using MKL compared to a naive implementation in C (that you should never, ever do, BTW).

1 comments

> BLAS will still need O(N^3) operations for a NxN matrix multiplications, whether they are optimized or not.

Why wouldn't they use an algorithm that is better than O(N^3)?

Two reasons: first, the matrix multiply algorithms with exponents less than three do not have the same numerical stability properties, which can introduce subtle bugs into software that was developed with the expectation of a "usual" O(n^3) multiplication being used. This makes it unsuitable for use in a general-purpose library.

Second, although algorithms with smaller exponents exist, there is more to high-performance than asymptotic complexity. In particular, the constant factors associated with these "fast" algorithms are typically large enough that there is no benefit to using them for "reasonable" matrix sizes.