| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zachster 5716 days ago

I've got this on a site I'm building that offers a sort by distance option. There's definitely a performance hit. I'm going to look into 'dumber' ways of filtering out the data prior to running this function. Maybe start by including state in the where clause, for example.

Any other ideas?

One solution I saw used more simple arithmetic to calculate a range of coordinates within levels of distance. That could be pre-cached, but it's a lot less accurate.

3 comments

nl 5716 days ago

Geohash: http://en.wikipedia.org/wiki/Geohash

(Edit: to be more specific, you can get a pretty good distance measurement using Geohash and comparing strings. Obviously, indexing strings is something databases do well. The exact distance a single character corresponds to depends on longitude & latitude, but there are lookup tables for that. There are also edge conditions to be aware of which may affect your application)

Or, use Postgres which has geospatial indexes.

link

zachster 5716 days ago

Thanks for the pointer! This looks interesting. The edge conditions seem like they might pose a problem. I'll have to check out how often it would occur. Maybe the geospatial indexes are a better bet. It looks like MongoDB supports them also. Good excuse to try that out.

link

bravo_sierra 5716 days ago

The edge cases happen all the time.

Using a B-Tree on a Geohash (like MongoDB does) is a bit more efficient that just indexing min/max values, but not by much. MySQL, PostgreSQL and even SQLite have R-Tree indices that perform 10x better.

link

nl 5716 days ago

If you are prepared to introduce new technology specifically to solve this problem, then you should take a look at LocalLucene, too: http://www.gissearch.com/locallucene

link

thibaut_barrere 5716 days ago

I've been using either MySQL with Sphinx or MongoDB with the built-in geonear successfully.

If you're already using MongoDB, it's really dead-easy to setup (see the docs).

link

bravo_sierra 5716 days ago

MySQL can use R-Trees too - http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.ht...

link

mthoms 5716 days ago

I haven't yet used either of these but PostGres earthdistance http://www.postgresql.org/docs/8.3/static/earthdistance.html or PostGis http://www.postgis.org/ might be good options (if you don't mind leaving MySQL behind that is).

link

joshu 5716 days ago

Use tiles.

link

bravo_sierra 5716 days ago

For what? How?

link

joshu 5715 days ago

Hash the points into large tiles. Only calculate the nearby tiles, then find the items that are in the list of tiles (which is faster, due to being indexed.) Then use Haversine or whatever to filter.

link