Hacker News new | ask | show | jobs
by ajconway 1968 days ago
How do you deal with updating the local database when a client was offline for an extended period of time and missed a lot of transactions?
1 comments

openDatabase loads the database's state into memory from the server, and then keeps it in sync with the server using the Web Socket. When you insert a transaction via one of the database operations, the server assigns the transaction a monotonically increasing sequence number, and broadcasts the transaction along with its sequence number out to the clients connected to a database. Clients then apply transactions in sequential order to the local state, and keep track of the latest sequence number applied. When a client goes offline and comes back on, it automatically reconnects the Web Socket and re-requests any transactions that it may have missed above its currently applied sequence number. We handle reconnection logic automatically under the hood, retrying on failure with backup delays.

Can read more on this process and how we optimize it when databases get large here: https://github.com/smallbets/userbase/blob/master/docs/userb...

I'm also working on an offline-first Google docs alternative that will write to IndexedDB, and stay in sync with Userbase using CRDTs. The tutorial on how to do it will be here: https://userbase.com/docs/

What happens when a new client joins? Does it download the entire history of all transactions and replays them into the local database?

Hot do concurrent modification get resolved (several clients try to modify the shared stage at the same time)?

>Does it download the entire history of all transactions and replays them into the local database?

This is what clients do initially, until the database grows in size. Every time the transaction log increases 50 KB, the client takes a snapshot of the database's state at a particular point in time, compresses and encrypts it, and uploads this state to the server. We call this a "bundle". This way when clients reopen a database, they load from the bundle first, and then apply any new transactions that come after it. Rather than needing to query for the history of all transactions and decrypting them individually and reapplying.

>Hot do concurrent modification get resolved (several clients try to modify the shared stage at the same time)?

The server assigns each transaction a distinct sequence number via an atomic operation. So clients always apply transactions with the same distinct sequence number, in sequential order. The client relies on this to enforce uniqueness and versioning. Only the lowest sequence number itemId gets applied to a database if 2 clients insert with the same itemId at the same time, and similarly, only the lowest sequence number version of an item gets updated or deleted if 2 clients update or delete the same item at the same time.

With regards to bundling, it's a bit more complicated and there are layers to our approach in safely handling it under high concurrency. When a client uploads a bundle, the database records what sequence number the bundling took place at so clients can use it to retrieve the latest bundle. And the server retains copies of bundles at prior sequence numbers. This way if two clients attempt to open a database right around the moment a bundling process completes (client 1 receives a bundle at lower sequence number, and client 2 receives a bundle at a higher sequence number), both clients receive the same set of transactions regardless. The server sends all transactions in the log after the bundle sequence number, so client 1 just needs to decrypt and apply more individual transactions to rebuild the state compared to client 2.

Some may find this interesting too -- we specifically test for safe concurrent behavior across 2 clients using a makeshift testing framework that opens 2 browsers at the same time and does some neat tests: https://github.com/smallbets/userbase/tree/master/test

If you clone the repo and run `npm run test:concurrency`, it will run those tests and output test results to the consoles of the 2 browsers.