| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by curtisblaine 27 days ago

When I try to do this kind of thing with y.js in a non-trivial way I always battle against two issues and ultimately quit because they're really hard to do efficiently:

1) Materializing documents. Assuming you don't have "live" yjs documents and you only merge diffs with diffUpdate, when one or more user are connected, it's always worth to have the blob in RAM to quickly merge diffs in it and save it periodically; when the usages of a document go away, you save it for the last time and you "ice" it in long term storage, offloading from RAM. I typically use a LRU cache for that. The problem is when too many users are working on too many docs and they all have to fit in RAM. How do you solve that?

2) GC. Again, assuming you don't have live documents but you only merge diffs, those blobs need to be garbage collected to compact them after a while iirc (if the doc is live it's done automatically). This normally is a periodic process that eventually GCs all documents in turn, one after the other. If you handle that, how do you manage to not make your server essentially unpredictable when it comes to compacting big blobs? GC'ing takes a toll on your CPU, and not GC-ing takes a toll on your RAM and secondary storage.

1 comments

philipisik 27 days ago

Interesting. What kind of content do you store in the ydoc? We're mostly working with text-based documents and don't really have any kind of performance or storage issues. Yjs documents are, if created well, both really fast and small. Hocuspocus easily handles >25k concurrent user connections on single instances without any real scaling effort.

link

curtisblaine 25 days ago

Mainly key/values

link