Hacker News new | ask | show | jobs
by mindcrime 2245 days ago
Saga Pattern.

https://blog.couchbase.com/saga-pattern-implement-business-t...

https://microservices.io/patterns/data/saga.html

https://developers.redhat.com/blog/2018/10/01/patterns-for-d...

https://www.enterpriseintegrationpatterns.com/patterns/conve...

https://en.wikipedia.org/wiki/Compensating_transaction

3 comments

I think these kind of patterns add huge amount of complexity to the system. Also properly testing them will be quite challenging.
I agreed, I like these patterns but I never encourage anyone to start with these patterns, First build a simple monolith to handle the situation, if there is a hard problem then and only then these should be applied, But these days I am seeing quite opposite though, I don't see enough evidence to start with these patterns and always use them as rule of thumb .
There's no question that it adds complexity, but I wouldn't agree that it adds a "huge amount." The Saga Pattern is actually very straightforward.

But as with anything, the question is "is this complexity worth the price you pay for it?" For me, given the advantages that microservices offer in many contexts, I find that using the Saga Pattern to maintain consistent state is totally worth it. But that won't be true for everybody in every situation.

Any library/SDK that allows you to implement these patterns should have sufficient testing scaffolding available as well. We use MassTransit for a large, distributed .NET Core + RabbitMQ service layer and unit tests are no more trouble than they usually are with the build in Bus and Consumer test harnesses.
If you build your system around events then the saga pattern doesn't take much additional effort.
What I always try to learn is, whether we need these patterns in the first place and do we always need them, But if we do, then these seem handy
Yes, this is an all but solved problem. Sagas handle this as gracefully as you can in a distributed transaction, and with even a modicum of foresight a lot of these problems can be avoided.
Do sagas help with rolling back in response to errors? This seems like the nastiest aspect of any distributed transaction approach: step A succeeds, step B succeeds, step C fails, call to rollback A and B fails... and now?

Or you do a two-phase commit: A, B, and C tentatively succeed, but then one of the commit calls fails, and now?

It seems like inconsistencies are inevitable no matter what you do.

The post you're replying to does list compensating transactions (a form of rollback)

One gotcha that is not covered by Sagas (I could be wrong) is when one or many of the network paths involved in the distributed tx become unreachable (network partition event) and you have no idea of the state of that part of the tx. Do you re-try that part and risk sending the same instruction twice (ok in some cases but not all) vs risk of having sent no instruction? If I had to implement a distributed tx I would first verify my mental model using TLA+ and use a (persistent) transactional messaging system with at-least-once delivery as the backbone, and make other accommodations for such scenarios.

Do you re-try that part and risk sending the same instruction twice (ok in some cases but not all) vs risk of having sent no instruction?

If you can make your compensating action idempotent, then yes, you can just keep retrying it. If it can't be made so for whatever reason, then a failure at that point demands manual intervention.

I suppose redundant communication channels (that go over different network modalities, e.g, data center native, satellite, 5G, etc) can be used to recover from network partition. Still, having a protocol with at-least-once delivery guarantee is important as it assures that no messages are lost due to unexpected crash of sender/caller or receiver/callee.
It seems like inconsistencies are inevitable no matter what you do.

At some level, barring guaranteed message delivery (which is effectively non-existent in any distributed system) you always reach a level where you can't guarantee consistency. It's the Byzantine General's Problem, basically.

https://en.wikipedia.org/wiki/Byzantine_fault

But based on empirical evidence, you can work out that a certain measure of effort dedicated to fault tolerance will yield correct results in X% of cases, and you can tune the value of X based on how much time/energy/money/effort you're willing to expend... up to a point.

Yes, it does... something similar enough.

Accounting has been doing that for centuries already, so it's not new by any means. It's also not free, it imposes severe restrictions on your system's architecture and the kinds of problems it can solve.