| HN Mirror

Maybe. Here's another way to think of the algorithm:

All the complexity comes about because we're trying to convert the insert / delete position from edits (expressed at their original version) to some later current version.

There's lots of ways of solving this problem. For example, we could build a data structure which contains metadata for every inserted item in a text document. For every inserted character, we store when the item was inserted and when (if ever) the item was deleted.

Then you could implement the algorithm in a simpler way. Lets say I'm trying to insert at position 1000, at some version V.

- We scan the list of characters from the start of the document, looking for the 1000th item which was actually in the document at version V.

- For each character in the list, we can tell if that item was inserted at version V by comparing V to the stored inserted / deleted at times.

This algorithm would be correct, and it avoids retreat / advance. The only problem with this approach is that it would be slow - because you're constantly scanning the document to convert insert positions. Inserting N items into a document take O(N^2) time.

The retreat / advance approach described in the paper is an optimization on top of this algorithm which performs the same work in O(N log N) time.

I wish we made this more clear in the paper. In an earlier draft we spent about 5 pages simply talking about version theory. The algorithm was then described using that theory with a stronger theoretical grounding. But I think that description may have been even more confusing.

> You say in the paper this should also work for other applications than plain text, but I guess then another CRDT has to be constructed to implement apply/retreat/advance. Would it be possible to formulate all of this independently of the application and particular CRDT, together with corresponding correctness theorems?

"Independently of the application and particular CRDT"? I don't know, we might have to think through how that would work for every CRDT. Do you have any personal favorites that would be worth thinking through?

For registers (eg in a variable, dictionary, hash map or array where indexes never change), you could implement a similar algorithm incredibly easily by just doing the version comparison operation on the graph. (The current value is the value set in the graph's frontier.) The retreat / advance optimisation isn't needed at all for registers.

For a list - for example, a list of layers in photoshop - we might need something more complex, since layers can be inserted / deleted like text and as a result the index of subsequent items changes. But layers can also be reordered - and that requires some thought. For rich text, there's an approach that I think would work but I haven't implemented it yet.