MySQL (InnoDB) clustered and non-clustered indexes

Y	Hacker News new \| ask \| show \| jobs

	MySQL (InnoDB) clustered and non-clustered indexes (n3n.in)
	23 points by skipass 4404 days ago

4 comments

falcolas 4404 days ago

I saw a presentation on an interesting hack done with InnoDB primary keys used to cluster similar data.

A hotel booking company created a primary key which would put geographically similar entries into primary keys which were close together. i.e. there wouldn't be an entry for a hotel in Saigon ordered between two entries for San Francisco.

I don't recall the specifics of how they did it, but due to how InnoDB orders pages on disk according to their primary key, it meant that all of the results for a particular location could be accessed by fetching one to two pages of data from disk.

Resulted in some huge performance gains for them, both by reducing the number of disk IOs required for your average query, and by reducing the on-the-fly geographical calculations they had to do in the DB. They could do a range query on the primary key in the database and then do fine-grained geolocation on a less contended resource against a small number of entries.

link

morgo 4391 days ago

The presentation was probably this one: http://www.percona.com/live/mysql-conference-2014/sites/defa...

Slides 16-18.

link

misframer 4404 days ago

A Z-order curve maps from two (or more) dimensions to one while preserving locality: http://en.wikipedia.org/wiki/Z-order_curve

link

fredophile 4404 days ago

Depending on your application you might want to pick a different space filling curve (http://en.wikipedia.org/wiki/Space-filling_curve). Different curves lead to different characteristics for nearness. You can also use multiple types of curves or two of the same curve oriented differently and then use the minimum distance of the curves as your distance estimate for some applications.

If you just want to generate keys you should look at Locality-Sensitive Hashing (http://en.wikipedia.org/wiki/Locality-sensitive_hashing).

link

falcolas 4404 days ago

That is most likely it, I was guessing they used the interleaving method presented there, but didn't know the formal term.

link

snake_plissken 4404 days ago

A nice little article. I've always felt that database indexes are one of those fringe topics that's just esoteric enough that people acquire a basic knowledge of them and know they need them, yet understanding indexes and using them correctly leads amazing performance.

On clustered indexes in particular, I try to follow these principles:

Narrow - in terms of a data type's byte length, so that more keys can be packed into each level of the B-tree. You can end up having to traverse fewer intermediate levels to reach the leaf level, where the data resides.

Unique - This ties into the point above, there will be no need to add a "uniquifier" to the key, helping to save space and the overhead of managing that extra tid-bit of data

Static - ideally, never changing. By it's nature, a clustered index is ordered, and updating/changing the keys will lead to data that's in the wrong place. You can kind of get around this by rebuilding the index after every update, but that just adds another task you'll need to manage more often.

Increasing - this can lead to faster inserts; in a sense, the DB is just filling up the last page of the index and then adding another when it needs to. (I think after a certain amount of inserts, eventually you'll have to re-build the index (to add more intermediate level nodes) but I can't recall the specifics, and you're going to have to do rebuild indexes anyway with all indexing strategies.)

link

gearhart 4404 days ago

If, like me, you found the idea interesting but the article a bit confused there's another good discussion on it here: http://use-the-index-luke.com/blog/2014-01/unreasonable-defa...

link

misframer 4404 days ago

I think it would have been easier for me to understand if they put everything in terms of a B+ tree, where the leaf nodes contain keys and values.

link

meritt 4404 days ago

InnoDB does indeed use a B+ tree. A more thorough article on the topic is http://blog.jcole.us/2013/01/10/btree-index-structures-in-in... or the book High Performance MySQL.

link

peterwwillis 4404 days ago

I remember when I was learning SQL, I thought "Why doesn't the database just look up the records faster by storing them in a B+ tr.... Oh, wait, it does that already." SQL is FAST!

link

misframer 4404 days ago

I know it uses a B+ tree. I wanted to mention that the article doesn't mention them at all.

I'm familiar with the High Performance MySQL book -- I work with one of the authors :).

link

meritt 4404 days ago

Cool. I was more taking the opportunity in replying to you to point out that this article isn't very good :)

link