| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Chris_Newton 1770 days ago

Indeed, there can never be one universal solution to this, because the problem is one of specification rather than (only) implementation.

For example, suppose we have an edit/delete conflict, where two clients concurrently interact with the same entity in your data model. In a simple case, we can decide to “resurrect” the affected entity and apply the edit, which is the option that never results in significant data loss and so might be a reasonable behaviour if no user interaction is involved.

Now, what if there were other consequences of deleting that entity? Maybe the client that deleted the entity then created a new entity that would violate some uniqueness constraint if both existed simultaneously. Or maybe it wasn’t the originally deleted entity that would violate that constraint, but some related one that was also deleted implicitly because of a cascade. How should we reconcile these changes, if simply allowing either one to take precedence means discarding data from the other?

At least if all clients are communicating in close to real time, it’s unlikely that any one of them will diverge far from the others before they get resynchronised, so the scope for awkward conflicts is limited. But in general, we might also need to support offline working for extended periods, when multiple clients might come back with longer sequences of potentially conflicting operations, and there’s no general way to resolve that without the intervention of users who can make intelligent decisions about intent, or at least a set of automated rules that makes sense in the context of that specific application. And in the latter case, we’d still probably want to prove that our chosen rules were internally consistent and covered all possible situations, which might not be easy.

2 comments

aboodman 1770 days ago

> How should we reconcile these changes, if simply allowing either one to take precedence means discarding data from the other?

Exactly. This is why Replicache expresses change as high-level operations, like createPost or deletePerson that are application-defined.

Replicache doesn't try to automatically merge the effects of concurrent mutations, it just replays the mutations in the same order on each client. It's up to the implementation of the mutation to decide what the correct result is, and that answer can and often does change when the mutation is replayed on top of different states.

Because Replicache mutations are atomic, applications can also enforce invariants such as uniqueness or even more complex app-level invariants.

Imagine, for example, a calendaring application. An application built with Replicache can enforce the invariant that a room is only booked by one event in one time slice even under concurrent edits, just using normal programmatic validation. It's hard to do this kind of thing with CRDTs or other approaches to automatic merging because the data model knows nothing about the application's constraints.

It's a pretty simple-minded system, actually, but our experience is that it is a nice way to think about these problems and provides good results for many types of data, in particular structured data.

soco 1770 days ago

The good old CAP theorem hits again...