| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lzimm 5755 days ago
	wont that completely negate the value of the indices that lat/lon may/may not sit on and result in a complete tablescan?

3 comments

zachster 5755 days ago

I've got this on a site I'm building that offers a sort by distance option. There's definitely a performance hit. I'm going to look into 'dumber' ways of filtering out the data prior to running this function. Maybe start by including state in the where clause, for example.

Any other ideas?

One solution I saw used more simple arithmetic to calculate a range of coordinates within levels of distance. That could be pre-cached, but it's a lot less accurate.

link

nl 5755 days ago

Geohash: http://en.wikipedia.org/wiki/Geohash

(Edit: to be more specific, you can get a pretty good distance measurement using Geohash and comparing strings. Obviously, indexing strings is something databases do well. The exact distance a single character corresponds to depends on longitude & latitude, but there are lookup tables for that. There are also edge conditions to be aware of which may affect your application)

Or, use Postgres which has geospatial indexes.

link

zachster 5755 days ago

Thanks for the pointer! This looks interesting. The edge conditions seem like they might pose a problem. I'll have to check out how often it would occur. Maybe the geospatial indexes are a better bet. It looks like MongoDB supports them also. Good excuse to try that out.

link

bravo_sierra 5755 days ago

The edge cases happen all the time.

Using a B-Tree on a Geohash (like MongoDB does) is a bit more efficient that just indexing min/max values, but not by much. MySQL, PostgreSQL and even SQLite have R-Tree indices that perform 10x better.

link

nl 5755 days ago

If you are prepared to introduce new technology specifically to solve this problem, then you should take a look at LocalLucene, too: http://www.gissearch.com/locallucene

link

thibaut_barrere 5755 days ago

I've been using either MySQL with Sphinx or MongoDB with the built-in geonear successfully.

If you're already using MongoDB, it's really dead-easy to setup (see the docs).

link

bravo_sierra 5755 days ago

MySQL can use R-Trees too - http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.ht...

link

mthoms 5755 days ago

I haven't yet used either of these but PostGres earthdistance http://www.postgresql.org/docs/8.3/static/earthdistance.html or PostGis http://www.postgis.org/ might be good options (if you don't mind leaving MySQL behind that is).

link

joshu 5755 days ago

Use tiles.

link

bravo_sierra 5755 days ago

For what? How?

link

joshu 5754 days ago

Hash the points into large tiles. Only calculate the nearby tiles, then find the items that are in the list of tiles (which is faster, due to being indexed.) Then use Haversine or whatever to filter.

link

bad_user 5755 days ago

I'm not sure what the author is doing, but take a look at this presentation: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-M...

You can basically reduce the filtering done to something like: WHERE lat BETWEEN val1 AND val2 AND lon BETWEEN val3 AND val4.

So indexing will work.

link

codesink 5755 days ago

true, the distance calculation must NOT be in the WHERE clause if you want to use indexes (and you want).

What I'm doing, given a max distance and a search point, is to calculate the bounding box in which I want to search in filter results with

WHERE lat BETWEEN lat_min AND lat_max AND lng BETWEEN lng_min AND lng_max

Calculating latitude min/max is trivial knowing that 1 latitude degree is 111.2KM. Longitude is a bit more convoluted because longitude degree size changes moving north/south.

$lat_min = $lat - $range_km * (1 / 111.2); $lat_max = $lat + $range_km * (1 / 111.2);

$k = $range_km/6371.04; $lng_min = $lng - rad2deg($k/cos(deg2rad($lat))); $lng_max = $lng + rad2deg($k/cos(deg2rad($lat)));

link