| Since they're not likely to approve my comment on their blog, here's what I said: "Way to misrepresent[1] vector clock usage in Riak! LWW deliberately ignores the vector clock. No one would use that in production without a strong assurance that they will never have concurrent writes. Also note that later in the post[2] Kyle shows how using them properly leads to zero data-loss. [1] https://yourlogicalfallacyis.com/strawman
[2] http://aphyr.com/posts/285-call-me-maybe-riak" To add on, > Cassandra addresses the problem that vector clocks were designed to solve by breaking up documents/objects/rows into units of data that can be updated and merged independently. This allows Cassandra to offer improved performance and simpler application design. This is not solving the problem vector clocks solve, it is punting on the resolution issue. Perhaps LWW partial updates result in greater performance, but they only solve performance. Listen to or watch http://thinkdistributed.io/blog/2012/08/28/causality.html To be fair, both designs are valid choices, but jbellis should be honest about his tradeoffs and not simply cast aspersions on other valid designs because they aren't the one that C* chose. |
As far as I can determine in testing with Jepsen, there are no cases where one can safely (e.g. in a way which guarantees some causal connection of your write to a future state of the system) update a cell in Cassandra without a strong timestamp coordinator: either an external system like Zookeeper, or Cassandra 2.0 paxos transactions.
Most of the production users and datastax employees I've spoken with recommend structuring your data as a write-once immutable log of operations, and to merge conflicts on read. This is exactly the same problem that Riak siblings have--except that you don't get the benefit of automatic pruning of old versions, because there's no causality tracking. GC is up to the user.