Hacker News new | ask | show | jobs
by Tarlitz 3930 days ago
Your second point is spot on. I realized this when I tried to optimize my numpy code that does millions of dot products between 3x1 vectors and 3x3 matrices to no avail. I realized there is an awesome little library called tinyarray, that doesn't have the overhead of numpy, and is compatible with the basic numpy syntax. I exchanged all my numpy arrays for tinyarrays and got the easiest 10x speed up ever.
2 comments

Thanks for mentioning that, very useful!

If you look in the numpy source you can begin to understand why the overheads are so large: for every operation it's first necessary to apply rules for broadcasting between different sizes and types, plus often being callable with a number of different function signatures (that includes __getitem__). And in many cases all of that is implemented in Python.

fwiw - the best you can do here, is probably implement the dots yourself as cdef functions. If your arrays get larger, you can get a C pointer to the blas routine from scipy and call it from cython (which can be faster than the dot you write when your arrays are larger)