Hacker News new | ask | show | jobs
by petschge 2605 days ago
I do plasma simulations and recently had the problem of finding the distance to the nearest neighbor for every of the particles in the simulation. Doing that naively is O(n^2) and took hours even for small test problems. Building an R-tree once and using if for nearest-neighbor look-ups brought that down to 5 minutes.

libspatialindex lacks documentation, but worked really nicely. The rtree interface in python is much friendlier.

4 comments

I’m taking Advanced Data Structures at UCSD right now and our first assignment was making a K-D Tree and an efficient KNN Classifier. It was surprisingly simple and the efficiency between the KD Tree and brute force implementation was quite drastic.

If you only build the tree once and do no insertions what is the benefit of an R-Tree vs KDTree?

I actually do plan to update the tree as I insert additional particles in locations where the distance to the nearest neighbor is large.
There is a library called flann that does approximate nearest neighbors using a kd-tree and best bin first method. It is typically used not only for nearest neighbors but finding them in higher dimensions.
Plasma simulation sounds really damn cool.
It's actually usually pretty hot. </pun>
I'm curious, how many particles do your simulations typically contain?
Typical full sized simulation runs contain between 100 million (1e8) and 100 billion (1e11) particles. Small tests while debugging might be as small as a few thousand particles (1e4). And tests of the code might only use one or two particles.

The thing I was working on when I switched from naive O(n^2) to R-trees had half a million (5e5) particles.