Hacker News new | ask | show | jobs
by a7b3fa 1540 days ago
The authors actually do briefly mention this concern in the section titled 3.7 Algorithm extensions (under Log truncation):

> However, in practice it is easy to truncate the log, because apply_op only examines the log prefix of operations whose timestamp is greater than that of the operation being applied. Thus, once it is known that all future operations will have a timestamp greater than t, then operations with timestamp t or less can be discarded from the log.

The main issue seems to be that we need to know about all the replicas that we want to be able to synchronize with - but I guess there isn't really a way around this.

1 comments

I did miss this section in my skim, but it's also the first place I went as well. It doesn't seem like you could ever be sure that no old messages would be sent with an earlier timestamp if you don't know the set of nodes that are participating in the protocol.

Moreover, if you allow the set of nodes you do know to make operations maliciously (e.g. a single bad node enters the set and tries to cause conflicts), things get much more complicated, and I don't really think you can escape the impossibility results in relation to consensus protocols...

Hi, one of the authors here. You're right that in order to ensure that you won't receive timestamps lower than some threshold, you need to know all the nodes in the system, and you need to hear from all of them (if even just one node is unreachable, that will be enough to hold up the process). That's quite a big assumption to make, but unfortunately there doesn't really seem to be a good way around it. Lots of CRDT algorithms have this problem of requiring causal stability for garbage collection.

Using a consensus protocol is possible, and would have the advantage that it only requires communication with a quorum (typically a majority of nodes), rather than all nodes. However, it has the downside that now a node cannot generate timestamps independently any more — generating a timestamp would then also require a round trip to a quorum, making the algorithm a lot more expensive.