Hacker News new | ask | show | jobs
by zachster 5716 days ago
I've got this on a site I'm building that offers a sort by distance option. There's definitely a performance hit. I'm going to look into 'dumber' ways of filtering out the data prior to running this function. Maybe start by including state in the where clause, for example.

Any other ideas?

One solution I saw used more simple arithmetic to calculate a range of coordinates within levels of distance. That could be pre-cached, but it's a lot less accurate.

3 comments

Geohash: http://en.wikipedia.org/wiki/Geohash

(Edit: to be more specific, you can get a pretty good distance measurement using Geohash and comparing strings. Obviously, indexing strings is something databases do well. The exact distance a single character corresponds to depends on longitude & latitude, but there are lookup tables for that. There are also edge conditions to be aware of which may affect your application)

Or, use Postgres which has geospatial indexes.

Thanks for the pointer! This looks interesting. The edge conditions seem like they might pose a problem. I'll have to check out how often it would occur. Maybe the geospatial indexes are a better bet. It looks like MongoDB supports them also. Good excuse to try that out.
The edge cases happen all the time.

Using a B-Tree on a Geohash (like MongoDB does) is a bit more efficient that just indexing min/max values, but not by much. MySQL, PostgreSQL and even SQLite have R-Tree indices that perform 10x better.

If you are prepared to introduce new technology specifically to solve this problem, then you should take a look at LocalLucene, too: http://www.gissearch.com/locallucene
I've been using either MySQL with Sphinx or MongoDB with the built-in geonear successfully.

If you're already using MongoDB, it's really dead-easy to setup (see the docs).

I haven't yet used either of these but PostGres earthdistance http://www.postgresql.org/docs/8.3/static/earthdistance.html or PostGis http://www.postgis.org/ might be good options (if you don't mind leaving MySQL behind that is).
Use tiles.
For what? How?
Hash the points into large tiles. Only calculate the nearby tiles, then find the items that are in the list of tiles (which is faster, due to being indexed.) Then use Haversine or whatever to filter.