Hacker News new | ask | show | jobs
by yjftsjthsd-h 1181 days ago
> The library was already using numpy for a lot of its calculations, so why should we expect Rust to be better?

I literally clicked in to read the article to see if they'd mention this:) But... unless I missed it, there wasn't really an answer? I thought numpy does do the heavy lifting in native code, so why is this faster? Does this version just push more of the logic into native code than numpy did?

5 comments

Numpy is fast when the code is vectorized. The code they are benchmarking against was not vectorized. They wanted to calculated the distances of n points against a given point and find out which points are closer than a threshold (max_dist). Instead of vectorizing the whole operation, the python code was just calling numpy in a loop to find the distance of two points.

Just that small change already gives 10x faster performance without ever leaving python/numpy land.

> They wanted to calculated the distances of n points against a given point and find out which points are closer than a threshold (max_dist).

Scipy should have already implemented such thing. Scikit-Learn also. Because KNN clustering is exactly doing this kind of work.

They did have a choice quote a bit before the blurb you quoted. The real world problem is sufficiently more complex/different enough that vectorizing it would be a pain.

>It’s worth noting that converting parts of / everything to vectorized numpy might be possible for this toy library, but will be nearly impossible for the real library while making the code much less readable and modifiable...

Thanks, I managed to miss that
The article mentioned that there are some gains from using vectorised numpy methods, i.e. spending more time in numpy code. I would be interested in whether the List[Polygon] could be converted into two long arrays of all xs and all ys with indices into the starts (essentially a dense representation of a sparse array) and then the core function rewritten for Numba since it could now not use any Python objects. This would break the interface of course, but may get within striking distance of the Rust implementation.
The slowness comes from the interaction of numpy and a Python object "Polygon", which in not numpy. I suspect that a sufficiently clever coder could have optimized the result without resorting to Rust, but at the cost of a substantial increase in complexity of the codebase. The proposed approach keep the Python code simple (and moves the complexity into having another language to deal with).

    diff --git a/poly_match_v1.py b/poly_match_v1.py
    index 675c88a..4293a46 100644
    --- a/poly_match_v1.py
    +++ b/poly_match_v1.py
    @@ -1,4 +1,5 @@
     from functools import cached_property
    +from itertools import compress
     from typing import List, Tuple
     import numpy as np
     from dataclasses import dataclass
    @@ -56,11 +57,8 @@ def generate_example() -> Tuple[List[Polygon], List[np.array]]:
     def find_close_polygons(
  polygon_subset: List[Polygon], point: np.array, max_dist: float
     ) -> List[Polygon]:
    -    close_polygons = []
    -    for poly in polygon_subset:
    -        if np.linalg.norm(poly.center - point) < max_dist:
    -            close_polygons.append(poly)
    -
    +    dist = np.linalg.norm([poly.center for poly in polygon_subset] - point, axis=1)
    +    close_polygons = list(compress(polygon_subset, dist < max_dist))
  return close_polygons
10x faster than the original, without resorting to native code, and without substantial increase in complexity of code base.
I wish the author would take your suggestion (and other recommendations) here and try it out again. Would be a very interesting follow up to read: "How we speed up Python and simplifying our codebase by removing on more dependency".
Focusing on speeding up find_close_polygons instead of realizing that you're matching many points against the same set of polygons is also unfortunate, since that function being slow is a red herring. You can create a scipy.spatial.KDTree for example and just query all the points against that.
People can write slow code in any language ;-) We've had to fire contractors who "wrote" dataframe code but was not vectorized in practice despite repeat request to do so. Same thing for slow code in CUDA.

From a maintenance view, I much prefer folks write vectorized data frames vs numpy or low level bindings, but that comes from having lived with the alternative for a lot longer. All of our exceptions are pretty much slotted for deletion. (Our fast path is single or multi-GPU dataframes in python.) Here's to hoping that one day we'll have dependent types in mypy!

> People can write slow code in any language ;-)

That was a lot of my curiosity, because I'm not quite a good enough programmer to know whether the problem is just that the original code is bad or does something super inefficient, and so rewriting it any language will let you make massive improvements.