| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aphyr 3418 days ago

> assume updates can be sometimes applied in a different order than they were submitted

I suggest you re-read the analysis. Some databases can offer safe, generalized commutative updates; e.g. Riak. Cassandra can't: updates, in general, can be lost through reordering.

> There is no better way than last write wins if you want high availability and partition tolerance and don't want to pay the performance and availability price for a consensus algorithm like Paxos or Raft.

There is. There's a whole field of research devoted to this problem. http://hal.upmc.fr/inria-00555588/document

1 comments

pkolaczk 3418 days ago

You can have safe updates through clustering columns or if you really insist on destructive updates - through LWTs. With clustering columns you can easily achieve whatever is possible with vector clocks.

http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-v...

As for the research you posted, there is no free lunch. Each of these strategies come with their own set of drawbacks. That's why Cassandra offers choice at a query level.

link

aphyr 3418 days ago

Clustering columns do not make Cassandra updates any safer: they only reduce the scope of conflicts. It's still last-write-wins--the approach that Cassandra's own blog recognizes as "a high potential for data loss".

That's why Cassandra offers choice at a query level.

It doesn't give you a choice: you get last-write-wins, or some limited merge functions with CQL types. You can't get generalized CRDTs in Cassandra, because they won't expose conflicts to the user layer. Cassandra gives up on a whole class of safe AP datatypes as a consequence of this restriction.

link

pkolaczk 3418 days ago

Now you are arguing about a lack of a particular builtin feature. Riak is last-write-wins by default with optional vector clocks and builtin CRDTs. Cassandra is last-write-wins by default with clustering columns allowing to implement an equivalent of Riak vector clocks and CRDTs in the application. If you include a client id in the primary key and use client-side timestamps, you essentially are doing vector clocks and there are no conflicts guaranteed - users of Cassandra have been doing it for years.

link

pkolaczk 3418 days ago

> they won't expose conflicts to the user layer.

They will, you'll see them as separate rows of the same partition, and then you can merge them as you wish in the application.

link