| > I disagree. You can create a CRDT flavor of data structure whose ops are not commutative. For example, a Set's add and delete operations. These are not commutative. You cannot switch the order of the ops for meaningfully processing them. However, you can create a CRDT Set. set add (union) is commutative, set delete is not. so a set with only the add (union) operation is a CRDT, but a set that supports both add and delete is not a CRDT, at least not without very specific caveats. you can create an add-only CRDT set, or an add-remove CRDT set, or a variety of other CRDT sets that support specific operations with specific caveats. but you can't create a CRDT set that is a fully-fledged set, with all the operations that sets generally provide. > You do that by adding metadata to the ops, and having the instances always process them in the only order that makes sense even if such instances receive the ops in a different order. so this is the crux of the issue, I think -- "having the instances always process [ops] in the same order" is basically not possible in any real-world network first, because it's not possible to decide what the "correct" set of ops actually is, nodes are always subject to partitions, faults, etc. which prevent reliable dissemination of knowledge, and plus all the stuff about light cones and etc. second, because "the same order" implies a specific total ordering of events is unknowable (see prior comments) > you are "ensuring" ops are behaving like they are commutative and not "requiring" them to be so converting a non-commutative operation to a commutative operation in a distributed system requires reliable delivery, which no network provides the whole point of CRDTs is that they give you a formally strong version of eventual consistency that holds even in the face of (unavoidably) unreliable delivery > My understanding is that a CRDT is op-based or state-based depending on what is "communicated" between the instances. If ops are communicated, then it is op-based CRDT, whereas if states (or delta-states) are communicated, then it is state-based. this is true in the abstract, but the issue is that "what is communicated" is not a given, it's subject to the choices you make when you encounter a network fault -- or in the CAP model, a partition CRDTs are tools for eventually consistent (AP) systems, which means that you have to keep making forward progress if there are partitions, which means that message delivery between nodes is not reliable, it can always fail for state-based CRDTs if you fail to deliver a message it's fine, the information in that message is not lost forever, it will be included in the next message, and (if the partition is eventually healed) the ultimate state will converge. this is also true for delta states but for op-based CRDTs if you fail to deliver a message it's not fine, the information in that message is lost forever, it won't be included in the next message, and (even if the partition is eventually healed) the ultimate state will not converge |
By having 1) causal order (eg. using what the article refers to as Lamport Causal Clock) and 2) a deterministic sorting function to sort ops that happened concurrently (from the perspective of causal order), we can derive total order.
It’s absolutely possible and used.
And with those two properties, almost any data structure can be turned into a (op-based) CRDT.
That is to say, thoughtlede has it correct in their comments above.