|
If I understand https://github.com/numpy/numpy/blob/v1.22.0/numpy/core/einsu... and https://github.com/numpy/numpy/blob/v1.22.0/numpy/core/src/m... correctly, using einsum without the optimize flag seems to use a for loop in C to do the multiplication. The optimizer clearly tries to improve the performance, but in many cases, it doesn't seem to change anything. Let's simply multiply some matrices: x, y = np.random.rand(200, 200, 200), np.random.rand(200, 200, 200)
I can do %timeit x@y
40.3 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
or a naive %timeit np.einsum('bik,bkj->bij',x,y)
1.53 s ± 21.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But even with optimization, I see %timeit np.einsum('bik,bkj->bij',x,y, optimize=True)
1.54 s ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I'm not sure if I'm doing something wrong. |
`optimize=True` is generally best when there are more than two tensors in the expression.