| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alex_smart 1172 days ago
	Numpy is fast when the code is vectorized. The code they are benchmarking against was not vectorized. They wanted to calculated the distances of n points against a given point and find out which points are closer than a threshold (max_dist). Instead of vectorizing the whole operation, the python code was just calling numpy in a loop to find the distance of two points. Just that small change already gives 10x faster performance without ever leaving python/numpy land.

1 comments

> They wanted to calculated the distances of n points against a given point and find out which points are closer than a threshold (max_dist).

Scipy should have already implemented such thing. Scikit-Learn also. Because KNN clustering is exactly doing this kind of work.