Hacker News new | ask | show | jobs
by bob1029 590 days ago
The triggering of the action is a direct consequence of the information an event contains. Whether or not an action is triggered should not be the responsibility of the event.

If you are writing events with the intention of having them invoke some specific actions, then you should prefer to invoke those things directly. You should be describing a space of things that have occurred, not commands to be carried out.

By default I would only include business keys in my event data. This gets you out of traffic on having to make the event serve as an aggregate view for many consumers. If you provide the keys of the affected items, each consumer can perform their own targeted lookups as needed. Making assumptions about what views each will need is where things get super nasty in my experience (i.e. modifying events every time you add consumers).

5 comments

> each consumer can perform their own targeted lookups as needed

that puts you into tricky race condition territory, the data targeted by an event might have changed (or be deleted) between the time it was emitted and the time you're processing it. It's not always a problem, but you have to analyse if it could be every time.

It also means that you're losing information on what this event actually represents: looking at an old event you wouldn't know what it actually did, as the data has changed since then.

It also introduces a synchronous dependency between services: your consumer has to query the service that dispatched the event for additional information (which is complexity and extra load).

Ideally you'd design your event so that downstream consumers don't need extra information, or at least the information they need is independent from the data described by the event: eg a consumer needs the user name to format an email in reaction to the "user_changed_password" event? No problem to query the service for the name, these are independent concepts, updates to these things (password & name) can happen concurrently but it doesn't really matter if a race condition happens

There should be some law that says strictly serialized process should never be broken into discreet services. Distributed locks and transactions are hell.
The best way to avoid distributed locks and transactions is to manually do the work. For example, instead of doing a distributed lock on two accounts when transferring funds, you might do this (which is the same as a distributed transaction, without the lock):

1. take money from account A

2. if failed, put money back into account A

3. put money into account B

4. if failed, put money back into account A

In other words, perform compensating actions instead of doing transactions.

This also requires that you have some kind of mechanism to handle an application crash between 2 and 3, but that is something else entirely. I've been working on this for a couple of years now and getting close to something really interesting ... but not quite there yet.

> This also requires that you have some kind of mechanism to handle an application crash between 2 and 3, but that is something else entirely

Like a distributed transaction or lock. This is the entire problem space, your example above is very naive.

You _can_ use them, but even that won't save you. If you take a distributed lock, and crash, you still have the same issue. What I wrote is essentially a distributed transaction and what happens in a distributed transaction with "read uncommitted" isolation levels. A database that supports this handles all the potential failure cases for you. However, that doesn't magically make the errors disappear or may not be even fully/properly handled (e.g., a node running out of disk space in the middle of the transaction) by your code/database. It isn't naive, it is literally a pseudo-code of what you are proposing.
It is not a DTC, despite the DB world using ATM's wrongly as an example for decades, follow their model for actually moving money and you would be sent to jail.

"Accountants don't use erasers"

The ledger is the source of truth in accounting, if you use event streams as the source of truth you can gain the same advantages and disadvantages.

An event being past tense ONLY is a very critical part.

There are lots of way to address this all with their own tradeoffs and it is a case of the least worst option for a particuar context.

but over-reliance on ACID DBMSs and the false claim that ATMs use DTC really does hamper the ability to teach these concepts.

The better version of this is sagas, which is a kind of a simplified distributed transaction. If you do this without actually using sagas, you can really mess this up.

E.g. you perform step 2, but fail to record it. When resuming from crash, you perform step 2 again. Now A has too much money in their account.

Sagas are great for this and should be used when able, IMHO. It's still possible to mess it up, as there are basically two guarantees you can make in a distributed system: at-least-once, and at-most-once. Thus, you will either need to accept the possibility of lost messages or be able to make your event consumers idempotent to provide an illusion of exactly-once.

Sagas require careful consideration to make sure you can provide one of these guarantees during a "commit" (the order in which you ACK a message, send resulting messages, and record your own state -- if necessary) as these operations are non-atomic. If you mess it up, you can end up providing the wrong level of guarantee by accident. For example:

1. fire resulting messages

2. store state (which includes ids of processed messages for idempotency)

3. ACK original event

In this case, you guarantee that you will always send results at-least-once if a crash happens between 1&2. Once we get past 2, we provide exactly-once semantics, but we can only guarantee at-least-once. If we change the order to:

1. store state

2. fire messages

3. ACK original message

We now only provide at-most-once semantics. In this case, if we crash between 1&2, when we resume, we will see that we've already handled the current message and not process it again, despite never having sent any result yet. We end up with at-most-once if we swap 1&3 as well.

So, yes, Sagas are great, but still pretty easy to mess up.

Here's how you can do this. You have 3 accounts: A, B and in-flight.

1. Debit A, Credit in-flight.

2. Credit B, Debit in-flight.

If 1. fails, nothing happened and everything is consistent.

If 2. fails, you know (because you have money left on in-flight), and you can retry later, or refund A.

This way at no point your total balance decreases, so everything is always consistent.

It can fail at your commas in 1&2, then you are just as broke as everyone else.

This isn't an easy-to-solve problem when it comes to distributed computing.

It should be an atomic transaction, double-entry style, so it can’t fail between commas.

The important thing is not having money go missing.

> Distributed locks and transactions are hell.

Which distributed transaction scenario have you ever dealt with that wasn't correctly handled by a two-phase commit or at worst a three-phase commit?

The scenario where one of the processes crashes cannot be handled by any number of commit phases.
Your event data must not be mutable.

That's kind of the first rule of any event-based system. It doesn't really matter the architecture, if you decide to name the things "event", everybody's head will break if you make them mutable.

If you decide to add mutation there in some way, you will need to rewrite the event stream, replacing entire events.

It's not about mutability of events, but about mutating the underlying data itself. If the event only says "customer 123 has been updated", and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted. Depending on the use case, this may or may not be a problem. If the consumer is only interested in the current state of the data, this typically is acceptable, but if it is needed in the complete history of changes, it is not.
Making a wacky 2-steps announcement protocol doesn't change the nature of your events.

If the consumer goes to your database and asks "what's the data for customer 123 at event F52A?" it better always get back the same data or "that event doesn't exist, everything you know is wrong".

> ... at event F52A

Sure, if the database supports this sort of temporal query, then you're good with such id-only events. But that's not exactly the default for most databases / data models.

I'm understanding what you have isn't really "events", but some kind of "notifications".

Events are part of a stream that define your data. The stream doesn't have to be complete, but if it doesn't make sense to do things like buffer or edit it, it's probably something else and using that name will mislead people.

> (...) and a consumer of that event goes back to the source of the event to query the full state of that customer 123, it may have been updated again (or even deleted) since the event was emitted.

So the entity was updated. What's the problem?

I understand gp to say that the database data is changed not the data in the event.

Surely, some data needs to change if a password is updated?

> an event might have changed (or be deleted) between the time it was emitted

Then I would argue it isn't a meaningful event. If some attributes of the event could become "out of date" such that the logical event risks invalidation in the future, you have probably put too much data into the event.

For example, including a user's preferences (e.g., display name) in a logon event - while convenient - means that if those preferences ever change, the event is invalid to reference for those facts. If you only include the user's id, your event should be valid forever (for most rational systems).

> your consumer has to query the service that dispatched the event

An unfortunate but necessary consequence of integrating multiple systems together. You can't take out a global database lock for every event emitted.

https://medium.com/geekculture/the-eight-fallacies-of-distri...

Also, CAP is a thing too.

Sure, try to keep transactions single-node. If you can't let me give you the advice of people FAR smarter than I:

- DO NOT DESIGN YOUR OWN DISTRIBUTED TRANSACTION SERVICE

Use a vetted one.

Thanks for your feedback, I appreciate it!

> The triggering of the action is a direct consequence of the information an event contains. Whether or not an action is triggered should not be the responsibility of the event.

I agree, but still for different consumers events will have different consequences - in some consumers it'll trigger an action that is part of a higher-level process (and possibly further events), in others it'll only lead to data being updated.

> If you are writing events with the intention of having them invoke some specific actions, then you should prefer to invoke those things directly. You should be describing a space of things that have occurred, not commands to be carried out.

With this I don't agree. I think that's the core of event-driven architecture that events drive the process, i.e. will trigger certain actions. That's not contradicting them describing what has occurred, and doesn't make them commands.

> By default I would only include business keys in my event data. This gets you out of traffic on having to make the event serve as an aggregate view for many consumers. If you provide the keys of the affected items, each consumer can perform their own targeted lookups as needed. Making assumptions about what views each will need is where things get super nasty in my experience (i.e. modifying events every time you add consumers).

This is feedback I got multiple times, the "notification plus callback" seems to be a popular pattern. It has its own problems though, both conceptual (event representing an immutable set of facts) and technical (high volume of events). I think digging into the pros and cons of that pattern will be one of my next blog posts! Stay tuned!

> I think that's the core of event-driven architecture that events drive the process, i.e. will trigger certain actions.

In an event-driven system, there is neither guarantee not expectation that an event will trigger an action; it might, but it might not. Events are simply a log [0] of "things" happening in various subsystems, published to various channels for other subsystems to ignore or act upon on their own terms.

Let say that we have two subsystems - A and B. When something happens on A, it will emit a corresponding event (e.g. SomethingHappened) to a specific channel (e.g. EventsFromA); if B is listening to that channel, it can "recognise" that event and initiate (i.e. "trigger") some action of its own.

However, if A explicitly wants B to do something, it's a command, i.e. a direct coupling by definition. As GP states, that is better handled as a direct request from A to B.

Theoretically, there is a possible scenario where A "knows" that a certain action needs to happen in the system, but does not know which subsystem has that capability, i.e. has no knowledge that B can do that. In that case it can "request" something to happen, e.g. by submitting an event like "UserCreationRequested"; however, there is no guarantee that any service will "see" that event and act upon it.

[0] https://engineering.linkedin.com/distributed-systems/log-wha...

Events shouldn't carry any data in my opinion, except parameterized data. In the context of a booking, for example, it would be SeatBooked {41A} instead of 41ABooked, though the latter is a better event, but harder to program for. The entire flow might looked like this:

SeatTimeLimitedReserved {41A, 15m}

SeatAssignedTo {UserA}

SeatBooked {41A}

If a consumer needs more data, there should be a new event.

Say that again louder for people in the back.

Message queues aren't a networking protocol. Anyone can subscribe to consume the events.

Another architecture might be that the service responsible for Seat Selection emits a `SeatSelected` event, and another service responsible for updating bookings emits a `BookingUpdated(Reason: SeatSelected)` "fat" event. Same for `PaymentReceived` and `TicketIssued`.

Both events would "describe a space of things that occurred" as @bob1029 suggests.

The seat selection process for an actual airline probably needs to be more involved. @withinboredom recommends:

  - SeatTimeLimitedReserved {41A, 15m}
  - SeatAssignedTo {UserA}
  - SeatBooked {41A}
In which case, only SeatBooked would trigger a BookingUpdated event.
Thanks for your feedback. I realize I should have elaborated the example a bit more, it's too vague. So, as I wrote in some other reply as well, please don't over-interpret it. The point was only to say that in order to differentiate the events, we don't necessarily need distinct types (which would result in multiple schemas on a topic), but can instead encode it in one type/schema. Like mapping in ORM - instead of "table per subclass", you can use "table per class hierarchy".