| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by FreeHugs 1175 days ago

The most important part of the article seems to be that this Python code is taking "an avg of 293.41ms per iteration":

        def find_close_polygons(
            polygon_subset: List[Polygon], point: np.array, max_dist: float
        ) -> List[Polygon]:
            close_polygons = []
            for poly in polygon_subset:
                if np.linalg.norm(poly.center - point) < max_dist:
                    close_polygons.append(poly)

            return close_polygons

And after replacing it with this Rust code, it is taking "an avg of 23.44ms per iteration":

        use pyo3::prelude::*;
        use ndarray_linalg::Norm;
        use numpy::PyReadonlyArray1;

        #[pyfunction]
        fn find_close_polygons(
            py: Python<'_>,
            polygons: Vec<PyObject>,
            point: PyReadonlyArray1<f64>,
            max_dist: f64,
        ) -> PyResult<Vec<PyObject>> {
            let mut close_polygons = vec![];
            let point = point.as_array();
            for poly in polygons {
                let center = poly
                    .getattr(py, "center")?
                    .extract::<PyReadonlyArray1<f64>>(py)?
                    .as_array()
                    .to_owned();

                if (center - point).norm() < max_dist {
                    close_polygons.push(poly)
                }
            }

            Ok(close_polygons)
        }

Why is the Rust version 13x faster than the Python version?

6 comments

spi 1175 days ago

Yeah but the Python code is so bad that it's easy to get a 10x speedup using only numpy, as well. The current code essentially does:

    import numpy as np

    n_sides = 30
    n_polygons = 10000

    class Polygon:
        def __init__(self, x, y):
            self.x = x
            self.y = y
            self.center = np.array([self.x, self.y]).mean(axis=1)


    def find_close_polygons(
        polygon_subset: List[Polygon], point: np.array, max_dist: float
    ) -> List[Polygon]:
        close_polygons = []
        for poly in polygon_subset:
            if np.linalg.norm(poly.center - point) < max_dist:
                close_polygons.append(poly)

        return close_polygons

    polygons = [Polygon(*np.random.rand(2, n_sides)) for _ in range(n_polygons)]
    point = np.array([0, 0])
    max_dist = 0.5

    %timeit find_close_polygons(polygons, point, max_dist)

(I've made up number of sides and number of polygons to get to the same order of magnitude of runtime; also I've pre-computed centers, as they are cached anyway in their code), which on my machine takes about 40ms to run. If we just change the function to:

    def find_close_polygons(
        polygon_subset: List[Polygon], point: np.array, max_dist: float
    ) -> List[Polygon]:
        centers = np.array([polygon.center for polygon in polygon_subset])
        mask = np.linalg.norm(centers - point[None], axis=1) < max_dist
        return [
            polygon
            for polygon, is_pass in zip(polygon_subset, mask)
            if is_pass
        ]

then the same computation takes 4ms on my machine.

Doing a Python loop of numpy operations is a _bad_ idea... The new code hardly even takes more space than the original one.

(as someone else mentioned in the comments, you can also directly use the sum of the squares rather than `np.linalg.norm` to avoid taking square roots and save a few microseconds more, but well, we're not in that level of optimization here)