Hacker News new | ask | show | jobs
by SideburnsOfDoom 1331 days ago
> thin events reduce coupling. Sure, the receiver might call an API and that creates coupling with the API.

You make a statement in the first sentence, and in the next sentence produce evidence ... that the statement is wrong. And, YMMV.

It is my experience that thin events add coupling. If service B receives an event, and wants to process it ASAP (i.e. near real time) and so calls back over http to Service A for the details, then

a) there is additional latency for a http call. And time variance - Even if the average latency of a http request round-trip is fine, the P99 might be bad.

b) You're asking for occasional "eventual consistency" trouble when A's state lags or has moved on ahead of the event

c) Worst of all: When service A is down or unreachable, Service B is unable to do work: Service B uptime must be <= Service A uptime. You have coupled their reliability, and if Service B is identified as mission-critical, then you have the choice of either making Service A equally critical, or decoupling them e.g. with "fat events".

I don't believe that it's accurate to say "receivers are also free to call or not call..." it's not choosing a flavor of ice-cream, you do the calls that the work at hand _needs_.

If you find that you never need to call back to service A then yes, "thin events" would suit your case better. That has not been my experience.

It's fair that event data format versioning is a lot of work with fat events - nothing is without downside. But in your case, do you have "dependency on the event body" ? All of it? If a thin event is all that you need, then you depend on a couple of ids in the event body, and not the rest. Json reading is very forgiving of added / removed fields, you can ignore the parts of a fat event that you don't care about.

3 comments

> You make a statement in the first sentence, and in the next sentence produce evidence ... that the statement is wrong.

My first sentence was quoting from the article, then I refute the article. Sorry if that wasn’t clear.

Re your point a), yes I agree in this case you’d send the contents in the body, but then I’d tend to call it stream processing rather than event processing - I admit this might seem like splitting hairs, but I do feel that there’s a difference between events and data distribution. And I personally find the data distribution pattern tends to be a lot more specialised.

Re b), it’s just an assumption that the receiver needs the version of data in the message, rather than the latest version. So I don’t think this is a strong argument for fat events.

Re c), again, it’s an assumption that the receiver needs the exact data provided in the event body; but I’ve found that, except in very simple cases, it’s very difficult to efficiently create event bodies that contain everything that all receivers are going to need. Maybe the receiver needs to collate a bunch more data, in which case the problem persists regardless of fat or thin, or maybe it just clears a local cache, in which case the problem is deferred until the data is needed and you probably have other things to worry about then anyway.

> I don't believe that it's accurate to say "receivers are also free to call or not call..." it's not choosing a flavor of ice-cream, you do the calls that the work at hand _needs_.

Sure, and the calls you make depend on the context, and if there is enough data in the event body to avoid making any calls at all. And I’m saying that in my experience that’s not generally the case. What I’ve seen is that the sender composes some event body and sends it, and the receivers end up needing to call APIs anyway.

In which case, the sender may as well have not gone to the trouble, hence my preference for thin events.

> But in your case, do you have "dependency on the event body" ? All of it?

From a maintenance perspective, the sender doesn’t know what the receivers depend on, so even if all your receivers only depend on the IDs, there is no way to find out. Because of this, it’s really easy to add fields to an event message, but really dangerous to remove them, because you can’t easily tell what receivers depend on the thing you’re removing. This is why I said that fat events create more coupling than thin events.

Of course as with most things there are always exceptions. Maybe I should have said, “I’m on team thin by default. But of course some use cases require fat messages, in which case proceed with great care”.

I think it's a straw man to say "we couldn't eliminate all API calls, so fat events are useless" - even removing 1 dependency at a time is a win. In my experience, you generally can do this, and that was the approach taken for reliability improvement.

> it’s very difficult to efficiently create event bodies that contain everything that all receivers are going to need.

"everything that all receivers need" seems like another straw man, a "you won't get it perfect so don't try to improve". I've seen it work well enough to be worthwhile.

> From a maintenance perspective, the sender doesn’t know what the receivers depend on

At a glance, no. But it's not imponderable, assuming a limited number of in-house consumers. The absolute statement about it isn't accurate.

> it’s just an assumption that the receiver needs the version of data in the message, rather than the latest version. So I don’t think this is a strong argument for fat events.

I've seen it cause a severe and hard-to-diagnose failure, when system A lags enough, so I think it is a strong argument.

> Maybe I should have said, “I’m on team thin by default.

Sure. I'm on team "fat events" by default because it can solve more issues than it creates. If it turns out that 90% of the event gets ignored, with no issues or http call-backs, then this might be a case for thin events.

> I think it's a straw man to say "we couldn't eliminate all API calls, so fat events are useless"

Well, yes it is a straw man, because I never said that.

> At a glance, no. But it's not imponderable, assuming a limited number of in-house consumers. The absolute statement about it isn't accurate.

That’s a pretty huge assumption. Especially when one of the advantages of pub/sub is supposed to be decoupling.

Anyway, we clearly have had different experiences, and there is no silver bullet.

b) You're asking for occasional "eventual consistency" trouble when A's state lags or has moved on ahead of the event

If you allow A's state to lag behind it's own events, then how are you ever going to create a sane system? Surely A either has to be ahead or at the state that caused the event to emit, or events are pointless.

> A's state to lag behind it's own events,

Real systems don't have just 2 services. There can be 100s and the "own events" assumption may not hold.

Sure, but in a thin events model someone would "own" the events since otherwise the subscriber wouldn't know where to query the actual data. What would you even do with an event saying a customer changes address if querying that address then produces the old one.

I'm genuinely curious how such an architecture would work. You don't have to respond directly here, but if you have any reference to further reading, I'd appreciate it.

> I'm genuinely curious how such an architecture would work.

Complex systems are the way that they are because they got that way over time. It is not my goal to defend or even characterise a system that I did not create.

I am here telling you the issue that I saw: one event consumer, at an edge case, ran substantially behind another, and when they attempted to co-ordinate over http, this failed. And how it was successfully resolved: fatter events removed the need for co-ordination between these two altogether. This was IMHO a more elegant design - it avoided he issues of the the thin events.

Ah, so A and C where both subscribed to B, but during A's processing of the event it assumed C had already processed it and tried to look up some state. Is that correctly understood?

This sounds more like an architectural deficiency (as you say probably from architectural decay) than a systematic design edge case. I can't quite understand what information A would need to get from C that could be included in the fat event but not simply queried from B.

> Ah, so A and C where both subscribed to B, but during A's processing of the event it assumed C had already processed it and tried to look up some state. Is that correctly understood?

yes, though you're down the rabbit-hole on this one issue. My point (aside from the fact we actually saw this specific issue and it took a long time to correctly diagnose) is that with thin events followed by a http query call-back, You're asking for occasional "eventual consistency" trouble. Data races will happen occasionally - this is inevitable in the design.

At the tail end of the latency distributions, too fast or too slow, or service A is now having a blip, or you now hit the new version just deployed or whatever, things will go wrong by mis-sequencing in surprising and hard to follow ways (example given that you're fixating on) in complex real systems, and it's a win to avoid that chaos entirely, with fat events.

If you allow A's state to lag behind it's own events

That's a mischaracterization. A's state is not lagging its emitted events; instead, A's state may have been changed at the time A's event is processed.

The "own events" was the faulty assumption. it's not always the same service that both emits the events, and is the place to go to over http for data. It "seems logical" to also build that store from listening to events, but it can cause issues as mentioned.
The comment I quoted says:

> when A's state lags or has moved on ahead of the event

That sounds like it can EITHER be ahead or behind. Specifically, I do not understand it as A's state can either lag OR be ahead, not that "lags" is a synonym for "moved ahead"

> b) You're asking for occasional "eventual consistency" trouble when A's state lags or has moved on ahead of the event

To be noted that this is the default if B is recovering after an outage.

Personally, I consider events to be insane. "We create an immutable database so that the state of the system is always recoverable." Okay, cool, very functional programming of you. "But then to actually work with the event from the immutable database, you have to query a stateful service." ??? What? And even fat events only go so far to get you out of that. So with a stream of n events, you don't have n states that the application can be in, but n times the product of all possible states of every other service that you query. How does this help?!

The bit you seem to be missing is the events are the source of truth, not the databases.

Lose your database? Roll up all the events. Got a lot of them? Take snapshots and then roll up from the last trusted snapshot.

In true event sourced systems, the databases and stateful systems are artefacts that can be thrown away and rebuilt. The event log is the actual “true” database.

Once you design around that, your objections melt away.

And if you think this is some faddish trend, this is how finance has worked since the invention of book keeping and how your databases under your stateful services are working under the hood.

This only works if your events are in a single globally ordered stream or all your code is eventually consistent over every stream it consumes. Specifically, you cannot do the "query a service for the aggregate state" thing this article espouses for thin events, ever.
You can achieve strong eventual consistency with this system.