Hacker News new | ask | show | jobs
by falcolas 4404 days ago
I saw a presentation on an interesting hack done with InnoDB primary keys used to cluster similar data.

A hotel booking company created a primary key which would put geographically similar entries into primary keys which were close together. i.e. there wouldn't be an entry for a hotel in Saigon ordered between two entries for San Francisco.

I don't recall the specifics of how they did it, but due to how InnoDB orders pages on disk according to their primary key, it meant that all of the results for a particular location could be accessed by fetching one to two pages of data from disk.

Resulted in some huge performance gains for them, both by reducing the number of disk IOs required for your average query, and by reducing the on-the-fly geographical calculations they had to do in the DB. They could do a range query on the primary key in the database and then do fine-grained geolocation on a less contended resource against a small number of entries.

2 comments

The presentation was probably this one: http://www.percona.com/live/mysql-conference-2014/sites/defa...

Slides 16-18.

A Z-order curve maps from two (or more) dimensions to one while preserving locality: http://en.wikipedia.org/wiki/Z-order_curve
Depending on your application you might want to pick a different space filling curve (http://en.wikipedia.org/wiki/Space-filling_curve). Different curves lead to different characteristics for nearness. You can also use multiple types of curves or two of the same curve oriented differently and then use the minimum distance of the curves as your distance estimate for some applications.

If you just want to generate keys you should look at Locality-Sensitive Hashing (http://en.wikipedia.org/wiki/Locality-sensitive_hashing).

That is most likely it, I was guessing they used the interleaving method presented there, but didn't know the formal term.