| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mweidner 73 days ago

A CRDT that operates on code units should work out okay, because each grapheme cluster will always be inserted and deleted in a single edit - hence it should stick together in the text. (Some CRDTs actually can mess this up by interleaving concurrent-inserted code units, but Yjs avoids doing so.)

From the fix PR, I believe the issue in this case was with the insertion operations passed to the CRDT, not the CRDT itself. Specifically, Yjs's ProseMirror integration infers what text was inserted by diffing before and after states, instead of directly capturing user inputs (even though those are provided by ProseMirror transactions). The diff algorithm, lib0/diff, was not grapheme aware and hence could generate an inaccurate diff containing lone surrogates.

Operating on code units is convenient in JavaScript because then your CRDT's `length` matches the language's `String.length`, and likewise for indexed access.