Hacker News new | ask | show | jobs
by quizotic 2672 days ago
There's a bit of an emperor's clothes problem with microservices. If microservices are never combined together, they are essentially monoliths under a cooler name. But as soon as you combine them, you run into the same problems that are solved by traditional shared database systems. Here are two: cross-microservice referential integrity, and cross-microservice transaction coordination, but there are plenty more.

Take the Orders/Customers example of the article. Assume the Customers microservice has some notion of identity, and that the Orders microservice contains a "foreign" reference to Customers identity. If the microservices are truly independent and autonomous, then the Customer microservice should be able to change its scheme for Customer identities, or a delete a Customer. But then what happens to the Orders that reference that now old and outdated Customer identity? If changing the Customer microservice breaks the Orders microservice, then what's the point of the separate encapsulation? Traditional shared database systems have ways to deal with this kind of referential integrity. As far as I can tell, this issue is ignored by microservice architectures. As is the larger issue of cross-microservice constraints (of which referential integrity is just one instance).

The thing that really gets my goat is the lack of cross-microservice transaction coordination. In its place, we get all sorts of hand-waving about "eventual consistency" and "compensating transactions". Growl. Eventual consistency has a meaning that's related to distributed/replicated storage and the CAP theorem. Maybe its principles can apply to microservices, but the onus is on the microservice proponents to actually connect those dots. And most importantly, eventual consistency is implemented and supported by the DBMS, not by application developers. The thought that each microservice will independently implement the transactional guarantees of consistency and isolation should fill everyone with dread. That job should belong to the overarching system, not to individual microservices. It's too hard to get right.

So for now, shared database in microservicew is not an anti-pattern. It may be the only workable pattern. When microservice frameworks grow up and offer the capabilities of shared database systems to microservice developers, then we can talk about anti-patterns.

6 comments

If you back your microservice architecture with a shared RDBMS and expect transactional and referential integrity, you've only scaled some parts of your system. Instead you should think about if A: you really need that scale ability, and B: the real implications of it.

So, in this customers / orders example what if you delete a customer? Ideally you don't, you keep the customer as long as you need the orders. You could perhaps anonymize it. And you define what happens if you cannot find the customer. Perhaps the orders should just be deleted? You could have a service running nightly that prune orders from customers that does not exist anymore. A cleanup service that checks for external dependencies and prune them as needed (note, this can be rather dangerous if done wrong).

For people that have formal education in distributed systems, or work with them, this seems very familiar, because it's some of the same things that make distributed systems hard. And that is what a microservice architecture is - a distributed system, which is why it scales well when done right.

This is also a problem we've mostly solved with CQRS and event sourcing, but it requires a lot of manual work and orchestration. It's hard and I can only say - you probably do not need a microservice architecture, and if you do, hire people with real experience in distributed system architecture. They're expensive, and if you can't afford it you do not need microservices.

The system can simply mark the "deleted" customer as a former customer and add records of their dismissal without any referential integrity problems. "Deleting" an entity doesn't mean that it should immediately vanish without a trace from the database, leaving a wake of destruction. It's only a business-level change: we won't accept further orders from deleted customers, there might be something nasty to do to their outstanding orders, and so on.
Doesn't address the problem of a customer somehow vanishing from the system without notifying dependent services. With at least once message delivery we know that the order service will know at some point, but we need to handle the case in the mean time.

It could also be a possibility that we need to delete customer records pr. GDPR, but need to keep the order records due to other laws, perhaps in an anonymized form.

Are you advocating sending and processing notifications about customer changes so that the order component can maintain redundant stale copies of customer data? Why would one do that instead of an appropriate and selective query when e.g. a new order is being entered?
For a MSA system, yes it should maintain just enough knowledge about customers to work, not everything. For instance it does not need to know the customers name, just the id. Systems that query the orders can query the customer component for additional data.

The denormalization and distribution of redundant data is required for it to scale. If you make the order component query the customer component, you haven't solved the problem from the other way around, and suddenly you have a hard coupling where a transient failure in one component automatically fails the other.

It might not be a tradeoff you're willing to make, but then you probably do not need the scaling - at least not along that vector.

How does CQRS solve the problem of referential integrity?
With a distributed system, there is no such thing as referential integrity. You can only mitigate the issue, and CQRS is good at that.
Can you give me a concrete example of using CQRS in such a setting and the problem it is solving?
You have to use event sourcing as well, CQRS alone does not solve it.

First you have to decide what trade offs you want. Ideally you will not expose any events from a service, but that is not realistic. And since the two services have some degree of connection, let's make the trade off that we want to expose events of creation and deletion of customers so other systems can keep track of a current list of customers. We utilize an at-least-once delivery mechanism of events.

The orders service would subscribe to the two events, and maintain an internal list of currently active customers. You cannot create orders for customers that does not exists, and when a customer is deleted, you can do what you need to do with the orders.

I'm just an ignorant nobody, but why would you split a microservice off along a facet that would require transaction and referential integraty? Wouldn't you want to pick off pieces that could be truly independent and not need to care about transactions or referential integrity? otherwise, what problem are you actually solving by splitting the piece off?
If you never need to coordinate two or more microservices, then sure. But that means you're dealing with a monolith. In the original article, you could combine Orders and Users into a single microservice. That would resolve all the issues that are raised ... except that you might want reference the Users in a different context. At that point, you either have to split Users into their own microservice, or duplicate the User information. Either way, you're backed into consistency issues.

Part of the promise of microservice is that they're small modular independent components that you can connect and combine into higher-level services.

If you need to read information and write information to two or more microservices, then you have transaction issues. If you need to relate information across two microservices, you have reference integrity issues. Just comes with the terrain.

Well, that isn’t a problem when you don’t care about the situation right now. For instance, let’s say you have VideoDescription, VideoSubtitling, and VideoContent as different microservices serving a description, the subtitles, and a handle to a list of content chunks. You need these things to line up (in that you may want all this for a single video). If VideoCentral says that you no longer have a movie it doesn’t matter. You can still serve video descriptions, subtitles, and content chunks until your view of the world changes. No big deal. If it is a big deal, then maybe the model doesn’t fit your problem. But for lots of software it really doesn’t matter. You don’t need instant consistency. If it’s consistent at some point that will do.

At no point will the writes insta-percolate. Instead you’ll push the writes into an event queue, they’ll execute eventually, and when the consuming services eventually update they’ll read the new result.

> Part of the promise of microservice is that they're small modular independent components that you can connect and combine into higher-level services.

Sounds to me that the marketing copy for microservices is missing a crucial observation: the reason independent components aren't hard to work with in regular software is partly because everything runs single-threaded, or if you end up multithreading, the response is predictable and near real-time, the environment is reliable and under your control. These conditions essentially mask transactional and integrity issues, which only become apparent as you scale to multiple machines connected over a network.

You're right that a shared resource like a database does endanger one of the microservice benefits and I think we're on the same page when I say the reality is "this is fine." You still get the benefit of being able to deploy, version, scale and manage a single microservice separately.

You might eventually have to break out the microservice entirely and build it out such that the shared datamodel has to be fully represented in api and not the database. That's fine too. That's a lot of work but you're only closer to that goal when you start with SOA and microservices not further away. Sure, you can get into trouble if you build a large interconnected monolith that you happen to be running in parts across many servers. I would, uh... advise against that. Try to separate your services in natural ways that won't cause massive headaches.

Basically, there are still some benefits and this critique, while valid, only points out what you aren't getting for free. Its not actually a negative.

>But then what happens to the Orders that reference that now old and outdated Customer identity? If changing the Customer microservice breaks the Orders microservice, then what's the point of the separate encapsulation?

Isn't that what append only event sourcing and CQRS is designed to solved?

If you’re putting a foreign key from customers into orders from the customer service into the orders service then the customer service key definition used as a foreign key cannot change.
> If microservices are never combined together, they are essentially monoliths under a cooler name.

This assertion completely misses the whole point of microservices: have highly specialized services that have a single and very limited responsibility, which clients can query independently and enable systems to be scaled (even horizontally) on their performance bottlenecks alone.

Describing a microservice architecture as a bunch of monoliths is simply missing the whole point.