Hacker News new | ask | show | jobs
by elbee 4686 days ago
So out of the main b-tree operations (Insert/Replace/Delete/Seek/Next/Prev) you make a convincing argument that Tokutek can be faster than b-trees for inserts, if you use non-unique indexes (which automatically disqualifies the primary index in most cases, and makes foreign keys hard).

That is good but not quite "beat B-trees pretty handily across the board" :-)

1 comments

Many primary keys in the wild are sequential, which makes the unique check hit cache and not require any I/O.

Let me be clear: in the worst case with a unique index, Fractal Tree indexes are the same speed as B-tree indexes. In most cases, they're far better. When you add in compression and agility, it really is "beat B-trees across the board."

To be honest, I have no idea what "agility" is. Perhaps I am too cautious but I'll want to see a lot more data than some hand-picked benchmarks and a mystical "unique constraints aren't interesting" before I'm completely convinced.

[Random aside: I greatly dislike block (i.e. InnoDB-style) compression, which is what Tokutek seems to use. Decompressing the entire block just to retrieve one column of one row is crazily expensive, especially as the decompression cost goes up as the block size goes up. There is also the problem of deciding whether to store compressed blocks in the buffer cache (slow), decompressed blocks in the buffer cache (wastes ram) or to have a cache of uncompressed blocks (the InnoDB approach, which kind of combines both problems). I think that format-aware page or column compression (like Oracle, SQL Server or ESENT) is far more effective.]

I guess that this has gotten way off topic but I am glad there are people out there doing exciting new things in the B-tree space. I will keep a closer eye on Tokutek in the future.

Agility refers to our online schema change abilities in mysql (hot column add/delete, hot indexing).

Our compression technique is naive, you're right (but we don't decompress the whole 4MB for one single point query, as you may be thinking, we're a bit more sophisticated). We have some more ideas in the pipeline if we need them (basically a bunch of tricks you can play when you know the schema), but the fact remains that what we have right now is extremely effective and there isn't much incentive yet to finish off our smarter prototypes.

(Don't worry about off topic I love deep conversations like this. Come to our mailing list if you want to continue more though, we may be exhausting HN)
Actually, this is the type of back-and-forth I (and I think others) would love to see more of. A dozen replies in, lots of good ideas, and no one is calling anyone else an idiot or a troll. Feel free to continue!