I find vector clocks to introduce a lot of complexity. In our case, we just use PostgreSQL to handle ordering. When committing an event into PostgreSQL, you verify that the last committed event for your stream is still what you expect it to be (i.e. CAS), and you have strong ordering.
Vector clocks I typically want to stay away from as far as possible.
Yeah, accept those writes and favour availability. Customer was a wealth management company.
Imagine a customer has £100 in their account. System partitions. Customer withdraws £70, hitting one DC. Customer then hits the second DC, this time withdrawing £50. Each DC thinks the transaction is valid, and so serves it.
Later when the partition is restored, events are played back, and divergent history is detected via the vector clocks - the two withdrawals are not causally related. Remediative action can then be taken.
Transactions prevent bad things happening, but require CP semantics. Eventual consistency allows AP, allows bad things to happen, meaning you have to be able to detect them and clean them up later.
There’s a long history on this debate, but in practice even the GOOG and AMZN have settled on transactional CP systems for handling the important stuff like money (Spanner, Aurora). Expecting app developers to roll their own transactions and conflict resolution at the app layer proved intractable even with all their resources.
> Remediative action can then be taken
Sounds expensive and error-prone; taking a “read only” outage makes more sense in many use cases
Does using Kafka help mitigate this? Or should it be producer-driven vector clock? I imagine the latter is the event-driven equivalent of a optimistic locking.
Edit - yep, the event producer manages and increments the vector clock.
---
I've not really used Kafka, so couldn't comment on that. I did some work for a customer that involved multi-DC microservices with isolated databases (ie one DB per DC). We used event sourcing with vector clocks to do manual reconciliation of the databases including during partition. Reconciliation involved custom logic depending on the event type, so not sure how a transport mechanism like Kafka would handle that.
Edit: to add on about Kafka, it guarantees a serializable ordering on the incoming messages and is driven internally by a vector clock. This may not be suitable for applications, but the throughput is high and allows multiple subscribers to get a consistent ordering.
Vector clocks I typically want to stay away from as far as possible.