Hacker News new | ask | show | jobs
by SideburnsOfDoom 1328 days ago
> A's state to lag behind it's own events,

Real systems don't have just 2 services. There can be 100s and the "own events" assumption may not hold.

1 comments

Sure, but in a thin events model someone would "own" the events since otherwise the subscriber wouldn't know where to query the actual data. What would you even do with an event saying a customer changes address if querying that address then produces the old one.

I'm genuinely curious how such an architecture would work. You don't have to respond directly here, but if you have any reference to further reading, I'd appreciate it.

> I'm genuinely curious how such an architecture would work.

Complex systems are the way that they are because they got that way over time. It is not my goal to defend or even characterise a system that I did not create.

I am here telling you the issue that I saw: one event consumer, at an edge case, ran substantially behind another, and when they attempted to co-ordinate over http, this failed. And how it was successfully resolved: fatter events removed the need for co-ordination between these two altogether. This was IMHO a more elegant design - it avoided he issues of the the thin events.

Ah, so A and C where both subscribed to B, but during A's processing of the event it assumed C had already processed it and tried to look up some state. Is that correctly understood?

This sounds more like an architectural deficiency (as you say probably from architectural decay) than a systematic design edge case. I can't quite understand what information A would need to get from C that could be included in the fat event but not simply queried from B.

> Ah, so A and C where both subscribed to B, but during A's processing of the event it assumed C had already processed it and tried to look up some state. Is that correctly understood?

yes, though you're down the rabbit-hole on this one issue. My point (aside from the fact we actually saw this specific issue and it took a long time to correctly diagnose) is that with thin events followed by a http query call-back, You're asking for occasional "eventual consistency" trouble. Data races will happen occasionally - this is inevitable in the design.

At the tail end of the latency distributions, too fast or too slow, or service A is now having a blip, or you now hit the new version just deployed or whatever, things will go wrong by mis-sequencing in surprising and hard to follow ways (example given that you're fixating on) in complex real systems, and it's a win to avoid that chaos entirely, with fat events.