| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by akasakahakada 969 days ago

Agree. Normal Python for loop apply to a Numpy array to do simple math is just pure nonsense.

Just tested how would it be without compile nonsense.

```

a = np.random.random(int(1e6))

%%timtit

np.average(a)

%timeit

np.average(a[::16])

```

And my result is that no matter how uncontiguous in memory (here I take every 16 elements like what they did, and I tested for 2,4,8,16), we are doing less operations so it always end up faster. Contrastingly their SIMD compiled code is 10-20X slower in uncontiguous case.

And for a larger array that is 16X of the contiguous one, but we only take 1/16 of its element, the result is like 10X slower as shown by the article. But I suspect that purely now you have a 16X larger array to load from memory, which itself is slow in nature.

```

b = np.random.random(int(16e6))

np.average(b[::16])

```

Which conclude that people should use Numpy in the right way. It is really hard to beat pure numpy speed.

3 comments

nerdponx 969 days ago

But that's precisely what makes this a good exercise, you can see how far you are able to close the gap between the naive looping implementation and the optimized array implementation.

link

Elucalidavah 969 days ago

> np.average

But that's not the function in the article. The article implements `(a + b) / 2`.

And, on my system, simple `return (arr1 + arr2) / 2` takes 1.2ms, while the `average_arrays_4` takes 0.74ms.

link

thatsit 969 days ago

Few years ago I tried to beat the C/C++ compiler on speed with manual SIMD instructions vs pure C/C++ Didn’t work out…

I can only imagine that this is already backed into Numpy now.

link

cozzyd 969 days ago

You usually have to unroll your loops for it to help (unless compilers have gotten smarter about data dependencies)

link