|
|
|
|
|
by renke1
1581 days ago
|
|
So I am planning to use CRDT sometime in the future. Any thoughts on Automerge vs. yjs? – I am not doing a text editor. I just want to build a solid offline-first web application. Also, is there any way to "squash" the history of changes? Let's say I have a central server through which all changes are synced (no peer-to-peer syncing). Does it make sense to force clients that haven't synced for a long time (let's say a weeks) to just discard their non-synced changed and use the "current" state as stored on the server? Okay, one more question: Let's say I want to add an API to my server that uses the data that was synced to server (assuming the sync state of Automerge/yjs is stored somewhere). Would the server in this case just be another client that just get's the data from the synced state and stores in an appropriate store (say a SQL database, Elasticsearch, etc.)? |
|
1. Choose Yjs for now.
2. Look at Yjs’s binary “update” format. That is what you should store in your database’s “blob” column. This also allows your backend to receive and transmit updates without hydrating the CRDT into JavaScript class instances. https://docs.yjs.dev/api/document-updates
3. Yjs has its own “gc” that discards deleted content. Without GC, deleted content remains in the CRDT but is hidden from the user’s perspective. You will need to hydrate the CRDT into memory for GC feature. I’m not sure how to run this GC, maybe it runs whenever you apply an update on a Y.Doc with doc.gc=true.
4. As long as GC is disabled, you can use “snapshots” to restore old versions of the doc. https://docs.yjs.dev/ecosystem/editor-bindings/prosemirror#v...
So, knowing the above, how to design a system like your question? I think you could go with a kind of hot/cold storage. Keep the “hot” version of your document in the “current” row of your Postgres table for a document. Send/receive updates to the hot row. Take snapshots on the server whenever you’d like to.
Then, the cold storage. Periodically, you want to GC the hot storage. Before you do that, apply it as an update to some cold storage, maybe a blob in S3 so you don’t permanently lose those deleted values, and your snapshots can work in perpetuity against the cold storage data. Then GC the hot storage.
I am more unsure about squashing. The naive way I implemented it is to just iterate copy all the data from OldHotDoc into a totally new independent NewHotDoc, and then archive/discard OldHotDoc. This will start a totally new history. What I’ve considered is that if any writes come from old clients before the squash, you can still apply the straggler writes to the old hot doc/old cold storage, and then manually diff the OldHotDoc before/after the change and then try to patch NewHotDoc the same way. Eventually you arrange for all clients to switch the the New doc history, and you can choose how long you’ll continue to try this janky patch strategy to accept straggler writes or just discard them.
I’m also not sure when you want to squash. I suggest fuzzing your system with the hot/cold storage part first to figure out what the rate of data growth of the “hot” storage is before you consider the squashing part.