Hacker News new | ask | show | jobs
by lzimm 5708 days ago
wont that completely negate the value of the indices that lat/lon may/may not sit on and result in a complete tablescan?
3 comments

I've got this on a site I'm building that offers a sort by distance option. There's definitely a performance hit. I'm going to look into 'dumber' ways of filtering out the data prior to running this function. Maybe start by including state in the where clause, for example.

Any other ideas?

One solution I saw used more simple arithmetic to calculate a range of coordinates within levels of distance. That could be pre-cached, but it's a lot less accurate.

Geohash: http://en.wikipedia.org/wiki/Geohash

(Edit: to be more specific, you can get a pretty good distance measurement using Geohash and comparing strings. Obviously, indexing strings is something databases do well. The exact distance a single character corresponds to depends on longitude & latitude, but there are lookup tables for that. There are also edge conditions to be aware of which may affect your application)

Or, use Postgres which has geospatial indexes.

Thanks for the pointer! This looks interesting. The edge conditions seem like they might pose a problem. I'll have to check out how often it would occur. Maybe the geospatial indexes are a better bet. It looks like MongoDB supports them also. Good excuse to try that out.
The edge cases happen all the time.

Using a B-Tree on a Geohash (like MongoDB does) is a bit more efficient that just indexing min/max values, but not by much. MySQL, PostgreSQL and even SQLite have R-Tree indices that perform 10x better.

If you are prepared to introduce new technology specifically to solve this problem, then you should take a look at LocalLucene, too: http://www.gissearch.com/locallucene
I've been using either MySQL with Sphinx or MongoDB with the built-in geonear successfully.

If you're already using MongoDB, it's really dead-easy to setup (see the docs).

I haven't yet used either of these but PostGres earthdistance http://www.postgresql.org/docs/8.3/static/earthdistance.html or PostGis http://www.postgis.org/ might be good options (if you don't mind leaving MySQL behind that is).
Use tiles.
For what? How?
Hash the points into large tiles. Only calculate the nearby tiles, then find the items that are in the list of tiles (which is faster, due to being indexed.) Then use Haversine or whatever to filter.
I'm not sure what the author is doing, but take a look at this presentation: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-M...

You can basically reduce the filtering done to something like: WHERE lat BETWEEN val1 AND val2 AND lon BETWEEN val3 AND val4.

So indexing will work.

true, the distance calculation must NOT be in the WHERE clause if you want to use indexes (and you want).

What I'm doing, given a max distance and a search point, is to calculate the bounding box in which I want to search in filter results with

WHERE lat BETWEEN lat_min AND lat_max AND lng BETWEEN lng_min AND lng_max

Calculating latitude min/max is trivial knowing that 1 latitude degree is 111.2KM. Longitude is a bit more convoluted because longitude degree size changes moving north/south.

$lat_min = $lat - $range_km * (1 / 111.2); $lat_max = $lat + $range_km * (1 / 111.2);

$k = $range_km/6371.04; $lng_min = $lng - rad2deg($k/cos(deg2rad($lat))); $lng_max = $lng + rad2deg($k/cos(deg2rad($lat)));