Hacker News new | ask | show | jobs
by ryanworl 2709 days ago
The CockroachDB blog post on this topic is a good summary. There is an additional trick that isn't directly KV related, but is important in a distributed environment when using a KV storage engine.

When defining a hierarchy of tables, such as customers -> orders -> order_line_items, you can make the primary key of the child tables contain the primary key of the parent table.

e.g. (customer_id) for the customers table, (customer_id, order_id) for the orders table, then (customer_id, order_id, line_item_id) for the order_line_items.

When this is stored on disk in a sorted format, it makes joins between these extremely cheap because the data will all be next to each other on disk.

CockroachDB calls this "interleaved tables".

2 comments

Interleaved tables are best for 1:1 relationships.
Hard disagree. This is the only way in a distributed, sorted KV store to get any semblance of data locality. If Cockroach and Spanner didn't do this, they would constantly be doing 2PC for modifying data that is related but stored on different groups of machines.
We are evaluating it for production and the senior engineer we talked to from the company told us that. I’d think he knows what he’s talking about.
I may be misinterpreting what you mean by a 1-1 relationship, but the documentation for Cloud Spanner, Cockroach, and the old FoundationDB SQL layer all use a similar schema to the one I described in their examples. Additionally, the F1 paper describes using it in the same way in Figure 2.
He said tables with a parent child relationship that is one to one meaning if I understand that right one parent ID mapping to one child ID. Its entirely possible either I’m misinterpreting him or he’s wrong but that’s what he said.
I think the data locality benefits would hold for one to many relationships too (that is, one parent with many children).
Also they do 2PC work in a 1PC way by maintaining a hidden transaction status table.
You can have hash sharding and be able to distribute tables easily compared to range sharding.
Can you link to the post you’re referring to?
https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin... this was in another comment.

https://emsal.me/blog/5 this blog post has a good introduction to their implementation of interleaved tables.