Hacker News new | ask | show | jobs
by sgtnoodle 1931 days ago
I mainly write embedded software in C and C++, but I tend to use python for non-embedded stuff. The funny thing is that I've attempted to speed up multiple python scripts by replacing lists of floats with numpy arrays, and each time it's ended up slightly slower. I suspect numpy only really starts to pay off when you're doing the same operations to 1000+ elements. For modest data sets, or algorithms that don't map particularly well to numpy's operations, the built in data types do better.

I also recently gave numba a go, and it was significantly slower than vanilla python. I was surprised because the decorated function seemed exactly like the type of code that numba would be good at (iterating through a list of coordinates doing square roots and some quirky floating-point modulo math.)

1 comments

Just checking, even though you likely know. The numpy arrays really only pay off when you vectorise your calculations. Don't expect any speed up if you're still using list comprehensions.

A vector operation is just where instead of having two lists, U and V which add to make W = [x+y for x,y in zip(U,V)] you directly operate on the arrays, W = U + V.

This allows the inner loop to be completely in native code. Sometimes some curious things happen where it's easier to do more vector operations resulting in more looping just because each of those operations is a vector operation and so is faster than the loop. For example, increment a number every time you see a particular element (time series breaking out subsequences) might look like np.cumsum(U == value) which is two loops in practice, but much faster than the iterative approach.