Hacker News new | ask | show | jobs
by jkarneges 2024 days ago
Another potential misuse of Kafka I've been wondering about is how a single Kafka instance/cluster is often shared by multiple microservices.

On one hand the ability to connect multiple microservices to a central message broker is convenient, but on the the other hand this goes against the microservice philosophy of not sharing subcomponents (databases, etc). I wonder where the lines should be drawn.

3 comments

I would argue that, if it's being used properly, the message broker itself is a service. It runs as a separate process, you communicate with it over an API, and its subcomponents (e.g., the database) are encapsulated.

It's all about framing and perspective, of course. But that's how I'd want to try and frame it from a system architecture point of view.

By that same reasoning Postgres is it's own micro service. It runs as a separate process, you communicate with it over a well defined API, and it's subcomponents (data store, query optimizer etc) are encapsulated.

With enough framing everything is possible, and in some contexts it will even make sense.

It has to do with how it functions in practice, IMO. PostgreSQL itself is arguably a service, but the database probably is not - you're probably crawling all over its implementation details and data model.

You could take a stand and say, "All access is through stored procedures. They are the API." And, if that API operates as the same semantic level as a well-crafted REST API, then you could make an argument that that particular database is a microservice that just happens to have been implemented on top of PostgreSQL. But I don't think I've ever seen such a thing happen in the wild. It's much more popular to use an ORM to get things nice and tightly coupled.

Such implementations exist and have been discussed a few days ago here on HN. There are also REST adapters for Postgres: https://github.com/PostgREST/postgrest
Wait, what? Isn’t the whole point of having multiple publishers/subscribers?
I think the point was about using a single cluster for multiple topics, for different services.

Depending on the scenario I can see the point. If the micro services are all part of the larger overall solution, having a single cluster is perfectly fine. Using the same cluster for multiple "product" is a little like having one central database server for a number of different solutions. You can do it, but it potentially become a bottleneck or a central point for your different solutions to impact performance of each other.

I'd agree there is an arguable difference between sharing a server vs sharing data within the server.

Bottleneck issues aside, letting two microservices connect to the same Postgres cluster but access different "databases" (collection of tables) within that cluster could be considered an acceptable data separation. Certainly with multi-tenant DBaaS systems there may be some server sharing by unrelated microservices/customers. Whereas letting two microservices access the same database tables would probably be frowned upon.

Nevertheless, sharing the same Kafka topics between microservices seems to be a common thing to do.

> Whereas letting two microservices access the same database tables would probably be frowned upon.

> Nevertheless, sharing the same Kafka topics between microservices seems to be a common thing to do.

I think if it is part of one whole isn’t this fine? You have one service that generates customer facing output, you may have another service that powers analytics/dashboards you may have yet another service that ETLs data into some data mart. Why wouldn’t they touch the same table/subscribe to the same topic (since they just need read-only access to the data)? Genuinely curious what the problem is except for bottleneck/performance; and if it just bottleneck then wouldn’t scaling horizontally solve it?

Sure. I believe microservice boundaries are more about development agility rather than scalability. By limiting each microservice to a minimal API surface and a "2 pizza" team, everyone can iterate faster. And if a particular microservice is implemented as multiple sub-services sharing database tables only known by the team in charge, that seems fine.
this isn't any different than microservices getting a deathball dependency on a user service or logging service or security service or ...

you either don't allow microservices to consume from others' topics, or you publish event schemas so they can still iterate independently.

the move to a 2nd kafka cluster in my experience has always been driven by isolation and fault tolerance concerns, not scalability.