Hacker News new | ask | show | jobs
by ibdknox 3219 days ago
Is there anything written about your multi-way join algorithm? We've done some really interesting implementations of Generic Join and the like and are always interested in hearing about other potentially better join algorithms.
1 comments

It's only described in my thesis, which is not yet publicly available. (It should be, but the university thesis publication process seems to have a latency measured in years...)

The case for triangle joins is simple to describe, though. Suppose we have attributes A, B and C and finite maps (relations) R : A x B -> X, S : A x C -> Y and T : B x C -> Z. Now build tries for each so we have

abx : Trie A (Trie B X)

acy : Trie A (Trie C Y)

bcz : Trie B (Trie C Z)

We can join two tries using join : Trie k v1 -> Trie k v2 -> (v1 -> v2 -> v) -> Trie k v. Putting this all together we have:

triangleJoin : Trie A (Trie B (Trie C (X, Y, Z)))

triangleJoin = join abx acy (\bx cy -> join bx bcz (\x cz -> join cy cz (\y z -> (x, y, z))))

Compared to a database that doesn't know how to handle triangle joins the difference is absolutely stunning. The above one-liner managed to do in 5 seconds what took MySQL over an hour...

Is there any way I could get a preprint of your thesis?

I've been working on a project that sits somewhere between spark and riak and this actually might address a core issue I have both at the node and query dispatch level.