|
|
|
|
|
by alex_smart
1172 days ago
|
|
Numpy is fast when the code is vectorized. The code they are benchmarking against was not vectorized. They wanted to calculated the distances of n points against a given point and find out which points are closer than a threshold (max_dist). Instead of vectorizing the whole operation, the python code was just calling numpy in a loop to find the distance of two points. Just that small change already gives 10x faster performance without ever leaving python/numpy land. |
|
Scipy should have already implemented such thing. Scikit-Learn also. Because KNN clustering is exactly doing this kind of work.