Hacker News new | ask | show | jobs
by josephg 635 days ago
First paragraph: yes, exactly.

> OTs take an event..

This is how the early Jupiter OT works, yes. And most OT systems work like this. But there are also some papers on more recent OT systems which can work with more than 2 peers. Unfortunately, many of these systems have turned out to have convergence bugs and/or they are O(n^2). For our paper one of our example datasets takes tens of milliseconds to replay with CRDTs and egwalker but an hour of time with OT!

> the data structure is capable of representing two different versions…

With egwalker it’s important to distinguish between two different data structures. There’s a grow only set of original editing events. This is persisted to disk and replicated over the network. Then while actually replaying events or merging, we generate a second, temporary, in memory data structure which resembles a normal CRDT. (Except with an extra state field on each item like you said). This crdt state object isn’t persisted. It’s usually discarded as soon as the merge (transform) operation is complete. One big advantage of this approach is that this data structure does not need to represent all items ever inserted. Just the concurrent items, back to the most recent common branch. So it’s usually tiny. And that allows history to be pruned - which CRDTs typically don’t allow.

But yes, everything else is right!

1 comments

This is probably a question about classic CRDTs as much as eg-walker:

Do all possible topological sorts of the event graph result in the same final consensus document? If yes how do we know that, and if no, how do they resolve the order in which each branch is applied?

> Do all possible topological sorts of the event graph result in the same final consensus document?

Yes. Thats usually referred to as the "convergence property".

> If yes how do we know that

Usually, careful design, mathematical proofs and randomized (fuzz) testing. Fuzz testing is absolutely essential - In over a decade of working on systems like this, I don't know if I've ever implemented something correctly first try. Fuzz testing is essential. You shouldn't trust the correctness of any system which haven't been sufficiently fuzzed. (Luckily, fuzzers are easy to write, and the convergence property is very easy to test for.)

For Eg-walker, I think we've pumped around 100M randomly generated events (in horribly complex graphs) through our implementation to flush out any bugs.

This seems to be a field perfect for theorem proving, I think I've seen some work by Kleppmann using Isabelle.

I once tried to understand the Yjs paper, but I came to the conclusion that their proof is just wrong! They do some impressively looking logical reasoning in the paper, but they define some order in terms of itself, so they don't really show anything, if I remember correctly. If you tried that in Isabelle, it would stop you already at the very start of all that nonsense.

I talked to Kevin Jahns (the author of the YATA paper & Yjs) about his paper a few years ago. He said he found errors in the algorithm described in the paper, after it was published. The algorithm he uses in Yjs is subtly different from YATA in order to fix the mistakes.

He was quite surprised the mistakes went unnoticed through the peer review process.

There have also been some (quite infamous) OT algorithm papers which contain proofs of correctness, but which later turned out to actually be incorrect. (Ie, the algorithms don't actually converge in some instances).

I'm embarassed to say I don't know Isabelle well enough to know how you would use it to prove convergence properties. But I have gotten very good at fuzz testing over the years. Its wild how many bugs in seemingly-working software I've found using the technique.

I think ideally you'd use both approaches.

Ah, that makes sense! I thought that Yjs must be doing something differently than described, because it seems to work well in practice, but I couldn't see how Yata would. Anyway, I learnt a lot by thinking through that paper :-)

Fuzz testing and proof are complementary, I think, both catch things the other one might not have caught. The advantage of Fuzz testing is that it tests the real thing, not a mathematical replica of it.