Hacker News new | ask | show | jobs
by avereveard 1328 days ago
assume the data format changes, it would change in the called api as well. as long as the fat event sends data that it's in the same format that the api would return, you'd have the same level of coupling.

I think fat vs thin is more about how much other services the event have to travel, because thin event would multiply reads by a fair factor, with the tradeoff being the performance hit for the queue system to store and ship large events

1 comments

With an API you can publish a new endpoint (/v1, /v2 etc). It’s normally reasonably easy to maintain an old API even while you add features to the new API, and the runtime penalty is minimal because clients would be expected to call just one version of the API for any given event. (You can also see who’s calling the old API and ask them to change)

But this is not true for events. If you change the body such that you now need to maintain two versions of an event, then you have to publish both events simultaneously, which means double the server side effort, storage etc for each event version. It’s pretty inefficient, and painful. You can work out who subscribes to the old event but there is still a big efficiency hit.

You might be right about many reads per event in a simplistic way; if you have a lot of clients then it could be expensive if you don’t have a server side cache. But there would typically be a lot of temporality in such a system so it seems like an easy problem to solve for most use cases; you don’t have to cache for long, but caches are of course tricky if your use case is not very simple. That said, if there is already a HTTP connection open then the additional latency and bandwidth hit cause by this events are going to be minimal in most cases, and probably drowned out entirely if you need to push multiple versions.

As I said in another thread, I should have said that thin is my default. There are cases when fat makes more sense, but normally I’d start with thin and see if I need to flesh it out. Whenever I’ve started fat I’ve ended up reverting.

Supporting multiple versions of an event schema is a solved problem. Apache Avro with a published schema hash in a message header is one solution.

https://avro.apache.org/

This lets you identify the version but it doesn't let old clients read the new messages. (Well, for avro and others they still can if the new fields aren't important or the old fields aren't required - but if you can do that you also don't really have a new incompatible version and you don't need the schema hash to begin with.)

The point is that with a pull-based API, I have a fixed number of requests. As clients migrate from /v1 to /v2, load on /v1 goes down and /v2 goes up, and I can adjust resource allocations accordingly to keep the total requirements relatively constant. I can even reimplement /v1 in terms of /v2 internally in many cases and have ~0 operational overhead.

But for an evented system, as soon as just a single client wants v2 I need to publish that, and as long as any client wants v1 I need to publish that. So my outbound "work" (at the very least i/o but probably also DTO conversions and god help you if it's any kind of storage or business logic) is doubled immediately and remains doubled until everything is migrated.

API versioning is more for external users, not internal. if your api is versioned, your events should be versioned as well tho, so we're at square one, as in, you're manufacturing a scenario where one approach is advantageous, and I agree your approach works in that scenario, but that is different than saying that one approach is advantageous at priori