|
|
|
|
|
by ique
5274 days ago
|
|
Even if we don't consider the difference in data structures here, they use wildly different algorithms. Numpy does all the matrix calculations by outsourcing it to BLAS[1] routines that are a mix of C/Assembly, just like the answers detail. BLAS is not only written in more efficient code, it's different algorithms altogether. BLAS can do a lot of optimizations that brings the total FLOP count to below what's usually considered required for matrix multiplication. (2m*n^2) [1]: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogram... |
|
Note that numpy may not use BLAS (this was done though as to avoid any hard dependencies on 3rd party libraries). I am not sure what you mean by putting the FLOP count below what's required. BLAS will still need O(N^3) operations for a NxN matrix multiplications, whether they are optimized or not. The biggest difference between libraries is usually in clever data organization/passing to use the cpu cache as efficiently as possible (memory throughput is usually the bottleneck until your data don't fit in RAM). You can easily gain one order of magnitude using MKL compared to a naive implementation in C (that you should never, ever do, BTW).