|
|
|
|
|
by sergei
4895 days ago
|
|
It's different from a typical sharding approach (including what MongoDB does). In their model, you take a single key and distribute your data using that key (e.g. user_id). The problem surfaces when you look at secondary indexes. If you have a secondary index say on user_location, and you want to query by that index, you don't know which shard to go to. So you end up broadcasting. Another problem is enforcing unique index constraints. With Clustrix, every table and index gets its own distribution. So if you have a schema like this: foo(a, b, c, d)
unique idx1(b,c)
idx2(d) Clustrix treats each table and index as a different distribution. So if I need to look something up by d, I know exactly which node has the data. I can also enforce index uniqueness. |
|