| This column-family/column/super-column lingo that Cassandra pulls out just makes it harder to understand its data model. In fact, it's quite simple: Keyspace: a hash table that holds your application data. Okay, the table is distributed among nodes (i.e.,a DHT), but it's still a hash table; Row: an entry in the above hash table where each value is composed by a collection of "column-families". Column Family: a key-value table (I avoid to call it a hash table because I don't remember if it's implemented as such). A better name for this thing would be 'Attribute Set'. Column: it's a key-value pair (with timestamp). Thinking about it as a column just blurs the concept. Better name: 'Attribute'. Note: it's possible to have a different set of attributes on a per-row basis (for the same Column Family), so this concept of 'column' breaks quite easily. Super-column: key-value pair where the value is yet another key-value table! Better(?) name: 'Super-Attribute'. Then Cassandra data model is in fact a nested set of key-value tables while dynamo's model is flat (just one level hash table). Oh! Last but not least, it's not a column-store. It's on-disk storage is row-oriented. |