Hacker News new | ask | show | jobs
by joefreeman 2865 days ago
Good article. I've spent the last year migrating to an event sourced system, so thought I'd share some thoughts.

On the eventual consistency point, I've found you can get quite far with having the read model managing the race condition. This probably doesn't work everywhere, but in our system, multiple users can accept an invitation, so we have something like `InvitationAccepted{invitation_id, user_id}`. It's possible that multiple users might accept the invitation at roughly the same time, but the command-side doesn't really have to be concerned with this - it can happily allow multiple users to accept an invitation. It's up to the read model to ask, 'has this invitation already been accepted?' - if not, the acceptance is successful and will be indicated when queried, otherwise the acceptance is unsuccessful (and as a bonus, we can separately record who unsuccessfully tried to accept it). From the user's point of view, when they accept the invitation, they see a spinner until we confirm with the read model one way or the other (this could be done by polling the read model, but in our case we have an event sent back to the client).

Coming up with the event schema and versioning/granularity are hard. We have version numbers on all our events to make this a bit more manageable/explicit (`InvitationAccepted1`, for example). Storing events in a relational database does make it a bit easier to go back and edit/upgrade/delete them (sort-of cheating, but also relevant for GDPR). Also, I think we're going to end up suffering a bit from the 'whole system fallacy', but at the moment namespacing all the events (keeping in mind their expected volume) makes it a lot easier to manage.

3 comments

> It's up to the read model to ask, 'has this invitation already been accepted?'

I feel like you've skipped over the interesting part of your strategy here. If it's an eventually consistent system, what keeps the read model from having the wrong answer to this question?

Not OP, but I am working with a CQRS system.

In CQRS eventual consistency does not mean that we have multiple servers such that we have 2 servers with 2 different answers. It means from a command is issued to an event is propagated to all read models, there is a delay.

You need to handle the race condition at some point or another. From a CQRS point of view, a user accepting an invitation is just an event like any other event. What happens based on that event must account for the possibility that multiple users have accepted, and it's a rather straight forward thing to solve with an ES. The "accept" event with the lowest sequence number is the first.

Having the read side handle it would probably mean that when you have a read model for accepted invitations, you ignore all but the first accepted of an "invitationId".

I've been planning out an event sourcing like system for our healthtech startup, and you're definitely right re namespacing and versioning.

I explicitly keep a version in the event - which is a date. It's similar in how Stripe versions their API. We're also planning to handle the events in the same way as stripe's API: each event has side effects; the side effects may change depending on the version and each version has its own application logic (cascading, so you can have 2018-08-01 run all of 2018-07-30 plus its own changes).

This lets us replay events as they happened, run an event using two different versions and perform only the diffs etc.

Our system is probably not a typical CQRS/event sourcing setup.

The event system itself idempotent: you take an event with all input data necessary to run the event (form data, necessary current state), so the system can run independently. This means that every event is typed such that the input data dictates what is necessary.

The event handler validates the event, returning errors if necessary.

Then, the event handler runs all side effects and returns operations to perform: update model X attributes to Y, insert new record Z.

In effect, we go:

User -> API -> (generate event) -> Event Handler -> (error|response) -> save event and side effects in a transaction.

This means our DB is a cache of the event and all previous data, so we're not really event sourcing — we're audit trailing.

The main benefits are:

1. Medical records are complex and we always need audit trails.

2. If a doctor submits a prescription, we can show all side effects that happened for visibility (ie. this triggered a lab task, push notification, sent this message). We can verify this in the UI and see what happened for each patient without relying on assumptions.

3. Engineers know that the API produces events and can look up exactly the side effects that happen when an event occurs (we're using Rails for the API logic right now and this isn't always obvious).

4. We can ensure that we validate when an event happens based on input and current state without complex code, catching edge cases.

5. We can choose to save the event and side effects or not. This lets us "preview" actions or "replay" actions without actually changing any world state (you toggle a "test" flag in the event which also means the event handlers know not to trigger outside side effects).

6. The "side effects" response from the event handler can be sent to a websocket observable and consumed by frontends, ensuring that the doctor UI always has an up to date version of patient data.

Random thoughts:

- It's really just a framework for the logic of an application controller that's typed and ensures everything is consistent. Plus, similar to Stripe, it allows us to version events and write migrations/upgrade paths etc.

- What about conflicts? We have a plan to use hashes of the previous data to ensure consistency with medical records: if you're modifying fields A, B, C, you send over a hash of the previous data for A, B, C alongside the request. If the event handler can't verify the hash the data must've changed in the meantime.

----

We're producing events now but the handlers aren't yet in place, so this is currently still being planned. Essentially, we're using the API as authentication, authorization, routing/HTTP management, transaction/database management while the event/controller logic is being placed into a structured framework to ingest form data, current state and produce output.

This is exactly what we did in our system and it worked wonderfully. It also has the side effect of avoiding locks or contention when doing such mutations.