| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jdefarge 5492 days ago

This column-family/column/super-column lingo that Cassandra pulls out just makes it harder to understand its data model. In fact, it's quite simple:

Keyspace: a hash table that holds your application data. Okay, the table is distributed among nodes (i.e.,a DHT), but it's still a hash table;

Row: an entry in the above hash table where each value is composed by a collection of "column-families".

Column Family: a key-value table (I avoid to call it a hash table because I don't remember if it's implemented as such). A better name for this thing would be 'Attribute Set'.

Column: it's a key-value pair (with timestamp). Thinking about it as a column just blurs the concept. Better name: 'Attribute'.

Note: it's possible to have a different set of attributes on a per-row basis (for the same Column Family), so this concept of 'column' breaks quite easily.

Super-column: key-value pair where the value is yet another key-value table! Better(?) name: 'Super-Attribute'.

Then Cassandra data model is in fact a nested set of key-value tables while dynamo's model is flat (just one level hash table). Oh! Last but not least, it's not a column-store. It's on-disk storage is row-oriented.