Hacker News new | ask | show | jobs
by jitl 884 days ago
I'm curious about this line describing REG:

> The REG algorithm excels with its fast local update speeds and eliminate concerns about tombstone collection in CRDTs. For instance, if an operation has been synchronized across all endpoints, no new operations will occur concurrently with it, allowing it to be safely removed from the history.

If you remove these ops from history, does that remove the ability to time travel (per the home page "An antidote to regret, enabling historical edits traversal") or merge branches? How can we be sure an operation is synchronized?

If dropping these ops is necessary for speed/storage optimization but disables time-travel, is it possible to put the removed historical/tombstone ops into a "cold storage" that's optional and only loaded for time-travel use?

3 comments

Hi! I invented replayable event graphs. I'm writing a paper at the moment about it, which hopefully should be out in a month or so. Send me a private email and I can mail you the current draft if you like.

> If you remove these ops from history, does that remove the ability to time travel

Yes it does. You also need the ops from history to be able to merge changes. You can only merge changes so long as you have the operations going back to the point at which the fork happened.

> is it possible to put the removed historical/tombstone ops into a "cold storage" that's optional and only loaded for time-travel use?

Absolutely. And this is very practically useful. For example, you could have a web page which loads the current state of a document (just a string. Unlike CRDTs, it needs no additional metadata!). Then if some merge happens while you have the document open, the browser could just fetch the operations from the server back as far as it needs to be able to merge. But in normal operation, none of the historical operations need to be loaded at all.

All this said, with text documents the overhead of just keeping the historical operations is pretty tiny anyway. In my testing using diamond types (same algorithm, different library), storing the entire set of historical operations usually increases the file size by less than 50% compared to just storing the final text string. Its much more efficient on disk than git, and more efficient than other CRDTs like automerge and Yjs. So I think most of the time its easier to just keep the history around and not worry about the complexity.

Is this a special case of operational transforms, vv or something else entirely?
I think you can argue that it’s both a special case of an OT system, and a special case of a CRDT at the same time. It stores the original operations rather than transformed operations (like an OT system would). Then it does OT in bulk on those operations on every peer by constructing a crdt state object on the fly.

So it’s very closely related to both crdt and OT based systems. I think if you squint your eyes you could argue that it’s both. It’s a grow only set crdt, storing operations that we do OT on. We do OT by embedding another crdt implementation inside each peer.

> If you remove these ops from history, does that remove the ability to time travel (per the home page "An antidote to regret, enabling historical edits traversal") or merge branches?

Yes. But squash can be supported.

> How can we be sure an operation is synchronized?

In Loro, we not only record the real-world timestamp efficiently, similar to Git, but also capture the DAG information. This approach ensures that if an operation (op) is particularly old, it will have many other ops depending on it. By utilizing both pieces of information, we can determine the operations that are likely synced across all peers. For peers like servers, it's feasible to preserve all operations. However, we can remove some operations in scenarios such as opening the document online for the first time.

> If dropping these ops is necessary for speed/storage optimization but disables time-travel, is it possible to put the removed historical/tombstone ops into a "cold storage" that's optional and only loaded for time-travel use?

Yes. This is not supported at the moment, but we hope to implement it before version 1.0.

> if an operation has been synchronized across all endpoints, no new operations will occur concurrently with it, allowing it to be safely removed from the history.

This assumes that the set of endpoints (really, nodes) is both well-known by all other nodes in the network, and stable over time (meaning new nodes will never be added).

Even if this assumption can be made safely (which is not a given) the GC process described here is still an optimization, which would be subverted when even a single node in the network became slow or broken.

It's also basically orthogonal to the concept of "tombstones", which are still required if you want to delete anything from the data structure.

Similar to OT, in certain scenarios, it's sufficient to ensure that only a subset of peers have the complete data, while others don't need the full history. For instance, in real-time collaboration scenarios with a central server, we can, just like OT, allow clients to hold only a shallow clone instead of the complete history. This approach results in minimal overhead for the clients.
I guess it all depends on how you define "client" and "shallow history", and the guarantees you provide around propagation of that history.

But if you have a central server that is considered to be the authoritative source of state, and assuming clients interact with that central server directly, then I'm not sure what is accomplished by modeling your data with CRDTs in the first place?