| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hvidgaard 2672 days ago

If you back your microservice architecture with a shared RDBMS and expect transactional and referential integrity, you've only scaled some parts of your system. Instead you should think about if A: you really need that scale ability, and B: the real implications of it.

So, in this customers / orders example what if you delete a customer? Ideally you don't, you keep the customer as long as you need the orders. You could perhaps anonymize it. And you define what happens if you cannot find the customer. Perhaps the orders should just be deleted? You could have a service running nightly that prune orders from customers that does not exist anymore. A cleanup service that checks for external dependencies and prune them as needed (note, this can be rather dangerous if done wrong).

For people that have formal education in distributed systems, or work with them, this seems very familiar, because it's some of the same things that make distributed systems hard. And that is what a microservice architecture is - a distributed system, which is why it scales well when done right.

This is also a problem we've mostly solved with CQRS and event sourcing, but it requires a lot of manual work and orchestration. It's hard and I can only say - you probably do not need a microservice architecture, and if you do, hire people with real experience in distributed system architecture. They're expensive, and if you can't afford it you do not need microservices.

2 comments

HelloNurse 2672 days ago

The system can simply mark the "deleted" customer as a former customer and add records of their dismissal without any referential integrity problems. "Deleting" an entity doesn't mean that it should immediately vanish without a trace from the database, leaving a wake of destruction. It's only a business-level change: we won't accept further orders from deleted customers, there might be something nasty to do to their outstanding orders, and so on.

link

hvidgaard 2672 days ago

Doesn't address the problem of a customer somehow vanishing from the system without notifying dependent services. With at least once message delivery we know that the order service will know at some point, but we need to handle the case in the mean time.

It could also be a possibility that we need to delete customer records pr. GDPR, but need to keep the order records due to other laws, perhaps in an anonymized form.

link

HelloNurse 2672 days ago

Are you advocating sending and processing notifications about customer changes so that the order component can maintain redundant stale copies of customer data? Why would one do that instead of an appropriate and selective query when e.g. a new order is being entered?

link

hvidgaard 2672 days ago

For a MSA system, yes it should maintain just enough knowledge about customers to work, not everything. For instance it does not need to know the customers name, just the id. Systems that query the orders can query the customer component for additional data.

The denormalization and distribution of redundant data is required for it to scale. If you make the order component query the customer component, you haven't solved the problem from the other way around, and suddenly you have a hard coupling where a transient failure in one component automatically fails the other.

It might not be a tradeoff you're willing to make, but then you probably do not need the scaling - at least not along that vector.

link

tirumaraiselvan 2672 days ago

How does CQRS solve the problem of referential integrity?

link

hvidgaard 2672 days ago

With a distributed system, there is no such thing as referential integrity. You can only mitigate the issue, and CQRS is good at that.

link

tirumaraiselvan 2672 days ago

Can you give me a concrete example of using CQRS in such a setting and the problem it is solving?

link

hvidgaard 2672 days ago

You have to use event sourcing as well, CQRS alone does not solve it.

First you have to decide what trade offs you want. Ideally you will not expose any events from a service, but that is not realistic. And since the two services have some degree of connection, let's make the trade off that we want to expose events of creation and deletion of customers so other systems can keep track of a current list of customers. We utilize an at-least-once delivery mechanism of events.

The orders service would subscribe to the two events, and maintain an internal list of currently active customers. You cannot create orders for customers that does not exists, and when a customer is deleted, you can do what you need to do with the orders.

link