|
Sorry, I was speaking loosely. More formally: In a system which uses LWW as the conflict resolution strategy, there exist no circumstances under which you can guarantee that a value written to a given key will be causally connected to any future state of the system, unless all values written to that key are identical, or a strong external coordinator (e.g. Zookeeper) orders timestamps. If you have siblings and vclocks, you can recover that causal connection guarantee for arbitrary write patterns--at least over CRDTs. Since Cassandra did not (until today) offer transactional isolation for any type of multi-cell update, this means that--and we're speaking strictly in terms of safety here, not performance--Riak and Voldemort's consistency models were, prior to 2.0, a strict superset of Cassandra's. For instance, you can guarantee the visibility and transactional isolation of a write making multiple changes to a Riak object; I'm reasonably confident that you cannot achieve those guarantees in, say, a Cassandra collection without a Paxos transaction. You can certainly emulate Riak's consistency model by storing a distinct object for every write, and this is, as I understand it, what many Cassandra users do. The difference is in space consumption. Consider making four updates to an object. In Cassandra, you could write each update to a separate cell. In Riak, you might write them all to the same key: Cassandra Riak
[update1] [update1|update2|update3|update4]
[update2]
[update3]
[update4]
To read from both Cassandra and Riak you need a merge function. Since neither provides ordering constraints, our merge must be associative, commutative, and idempotent in both cases. Cassandra Riak
[update1]+ [update1|update2|update3|update4]
[update2]+ | | | |
[update3]+ +--------+-------+-------+
[update4]+ |
| |
V V
[current value] [current value]
The difference is in space. Vector clocks allow you to prune the causal history, meaning we can write back [current value], and as soon as a node sees that write, it can discard updates 1-4. In Cassandra, there is no causality tracking: you have to figure out how to do GC yourself, or punt. Cassandra Riak
[update1] [merged value|update5]
[update2]
[update3]
[update4]
[update5]
You can see how unbounded space might be a problem. From my conversations with DataStax, it sounds like users tend to write reducers which apply their merge function to compact some portion of the history. Which portion? Well, without causality tracking we'll leave that as an exercise to the reader. Cassandra Riak
[update1-4] [merged value|update5]
[update5]
Does this look familiar? Yeah. It's the same concurrency model as the vector clocks this post is arguing against. You just have to do more work.Now, there are all sorts of practical efficiency constraints at play! For instance, Riak has ~50-100 bytes of overhead per key, and will start barfing if you go over 10 megabytes per key or so. And without being able to call list-keys, you wind up having to play all kinds of games with predictable keys, splitting datasets between multiple objects, and so on. Cassandra's IO throughput generally seems much higher than Riak's, and Cassandra has a much more efficient representation for wide values. It also offers better key ranges--but you also pay a per-cell overhead for every atomic chunk of state. Not so efficient if you were looking to store, say, big blocks of integers for your CRDTs. The great thing is--again speaking purely in terms of consistency--Cassandra 2.0 is now capable of a superset of Riak's operations! If correctly implemented, their Paxos operations support linearizable reads and writes, which is a way stronger class of consistency than the CRDT operations described above. I don't understand why jbellis is so upset when folks point out that LWW provides weak safety constraints--when their strongly-consistent operations now offer the highest level of transactional safety. Seems like we should be celebrating that achievement, because it opens up large classes of operations which were previously unsafe. :) |
I don't think he's upset about LWW being characterized as a weak safety constraint, but that the perception that what's provided by Cassandra is equivalent to per-key LWW. While it doesn't serve to completely eliminate the chance of data loss caused by conflicts, breaking a complex data structure into atoms that resolve independently vastly improves the average and P99 (and probably many more 9s) case. The argument being made is that while not as correct as vclock+sibling resolution, this is within the threshold many real life use cases are willing to tolerate.
The other thing I think is mischaracterized is that the choice to use timestamps over vector clocks was done out of ignorance or that there is nothing gained. This was a conscious choice and made with the trade-off of performance in mind. We should strive for the largest amount of correctness given the constraints of performance and/or availability. While the CAS operations in C* 2.0 are useful, they sacrifice a lot on those fronts to gain that correctness. Systems that needlessly trade correctness without returning serious dividends (I'm sure we can all name a few) add no value.