| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by amsha 2810 days ago

Here's the key sentence: "concrete implementations of CRDT in co-editors revealed key missing steps in CRDT literature." This paper may be correct for academic CRDTs but it is very wrong when looking at industry implementations.

My hunch is that because CRDTs are so much easier to grok than OT, engineers are empowered to make use case-specific improvements that aren't reflected in academic literature.

For example, the Logoot/LSEQ CRDTs have issues with concurrent insert interleaving; however, this can be solved by dedicating a bit-space in each ID solely for followup inserts. The "Inconsistent-Position-Integer-Ordering" problem is solved by comparing ids level-by-level instead of in one shot.

Fundamentally, CRDTs have strong strategic advantages over OT. Given a document size N and a document edit history H:

* CRDT size is O(N) where OT is O(H)

* CRDT updates are O(log N) where OT is O(H^2)

In any nontrivial document, H ≫ N. This means CRDTs have much better perf than OT. Additionally, the best CRDTs (like Logoot/LSEQ) don't require tombstones, garbage collection, or "quiescence." The complexity burden is far lower.

To top it off, CRDTs are offline-capable by default.

2 comments

nused 2809 days ago

Casual and blanket theoretical characterizations of OT and CRDT algorithms is exactly the kind of folly that the article is trying to address.

For example, it's a logical contradiction to begin the statement with "OT's complexity is _generally_ governed by editing history (H), i.e. O(H)" to conclude that "CRDTs have much better perf than OT", then adding a qualification that argument applies only for the specific off-line editing scenario (which btw is also not true).

It's also inaccurate to base CRDT's complexity on N - the document size, i.e. number of characters visible in the document. You need to include tombstones for the case of WOOT variants or use garbage collection for RGA (which then requires vector clocks). These nuances are described in sections 4.3.

OT's complexity is governed by O(c) where c is number of concurrent operations, period.

For off-line edits, the short answers is that OT's complexity is also not O(H), if H is the number of character edits, because you can easily apply compression. Now, the longer answer: there is pretty strong spatial locality in the distribution of edits over a document -- we don't sporadically and randomly add or delete characters around a document, but intuitively, most of the inserts will be adjacent characters (i.e. strings), and many of the deletes are over newly inserted strings. OT uses string-wise operations, hence it will compress n consecutive character insert ops down to a single string op. In addition you can compress delete edits over newly inserted content: i.e. [insert 'To be or not yoo be' at 0, delete 'y' at '14', delete 'o' at '14', insert 't' at '14'] => [insert 'To be or not to be']. This is basic idea behind "operation compression" which is what OT would use to support off-line, non-real-time, asynchronous editing, however you want to call it. There is a better example here (http://www3.ntu.edu.sg/home/czsun/projects/otfaq/#_Toc321146...).