| almost! :) let me elaborate on few points. > you have to reference all of the current concurrent root nodes of the data structure, in effect becoming the new single root node correct, and more precisely the union of heads is the current "single root node". in practise, and this is where the merge strategy comes in, the "latest value" is the value of the event that is "last" (as per LWW sorting). > and your event data, which must be a CRDT, gets merged with the CRDTs of those root nodes. the event data itself doesn't have to be a CRDT, can be any data structure. the "root nodes" (meaning the heads of the log) don't get merged with the "event data" (assuming you mean the database/model layer on top of the log), the merge strategy of the log picks the "last/latest event data" to be the latest value of your data structure. > It's time to write a new value, so I create a new tuple with references to 4 and 6, and merge their CRDT values. when a new value is written, correct that the references to 4 and 6 are stored, but the new value doesn't merge the values of the previous events and rather, it's a new value of its own. it may replace the value from one or both of the previous events, but that depends on the data model (layer up from the log). 1 --> 2 --> 3 --> 4
`--> 5 --> 6
> Last Writer Wins, right? So either 4 or 6 dominates the other. Whoever was in the other causal history just... lost their writes?no writes are lost. the result in your example depends what 4 and 6 refer to. in a log database, the ordered log would be eg. 1<-2<-3<-5<-4<-6, so all values are preserved. in the case of a key-value store, it could be that 4 is a set operation to key a and 6 is a set operation to key b, thus the writes don't effect each other. if 4 and 6 are both a set operation on key a, it would mean that key a would have the value from 6 and the next write to key a would overwrite the value in a. makes sense? |
How do you know that? It's not inferrable from the DAG. Is sequencing also provided "a layer up"?
> if 4 and 6 are both a set operation on key a, it would mean that key a would have the value from 6 and the next write to key a would overwrite the value in a.
Yes, I mean for all of my events to be reads and writes of the same key. And you've proven my point, I think: if the resolution of this causal tree is Last-Writer-Wins, 6-domiantes-4, and "key a [gets] the value from 6", then whichever poor user was operating on the 3-4 causal branch has lost their writes.
This is a problem! If you claim to be a CRDT and offline-first or whatever, then as a user, I expect that the operations I make while I'm disconnected aren't just going to be destroyed when I reconnect, because someone else happened to be using a computer with a lexicographically superior hostname (or however you derive your vector clocks).
And if you want to say something like, well, when the unlucky user reconnects and sees that their work has been overwritten, they can just look in the causal history, extract their edits, and re-apply them to the new root -- there's no reason for any of this complex machinery! You don't need CRDTs to just replicate a log of all operations. Of course, you also can't do any meaningful work with such a data structure as a foundation, because it immediately becomes unusably large.