| If you spent a day doing some serious linear algebra in Python you might change your tune. * (multiplication) between NumPy arrays by default does element-wise multiplication (potentially with broadcasting), which is the desired default behavior. In R, for example, you can define custom infix operators so that a * b is elementwise but a %* % b is matrix multiplication. If you write down a complicated linear algebra expression, something like A.T ! (B.T ! C ! B).T ! A would be a lot friendlier to scientists than the current: dot(A.T, dot(dot(B.T, dot(C, B)).T, A)) You might just say "well suck it up" but I've got to say that doing linear algebra in Matlab is a lot easier because the linear algebra that I do with pen and paper looks pretty much exactly the same as the corresponding code. On the other hand, Matlab is super clunky compared with NumPy at doing APL-style array processing with broadcasting operations, etc. In my work I tend to do more of the latter and less of the former but whenever I implement something with a lot of matrix multiplications it takes me a lot longer in Python to get things right. Anyway, the point is: non-scientific Python folks need to take a walk in our shoes to gain an understanding of the challenges we face on a ongoing basis. I'm having a hard time understanding your last statement. NumPy data types (dtypes) simply tell the ndarray how to interpret the block of data associated with it (the # of bytes per item, shape, and strides). |
The last comment about dtypes is that the size (shape in NumPy) should be part of the type; certainly element-type should be. A 1-D vector of double should not be the same Python type as 3-D array of characters or a 500x600 matrix. This creates havoc in a dynamic language. I once spent two days tracking down a bug caused by "*" multiplying two arrays instead of two matrices when I started using NumPy. Perhaps size is too much for a dynamic language to be part of type, but surely dimension and element-type should be reflected in Python type. It absolutely does not help that the documentation is littered with type-objects and object-objects; I appreciate this is how C implementations are, but for a beginner NumPy users, it is more than a little confusing.
I have a lot of respect for people who designed and built NumPy; for multi-dimensional arrays I don't see a better approach; but too much of influence of C implementation details seep into Python interface, types and operator overloading are only the beginning of the problems. I am having second thought about the suitability of dynamic languages for large scale, high-performance computing. Type annotation could help a lot but I see there is exactly zero interest in that for NumPy.