| HN Mirror

Sagas are great for this and should be used when able, IMHO. It's still possible to mess it up, as there are basically two guarantees you can make in a distributed system: at-least-once, and at-most-once. Thus, you will either need to accept the possibility of lost messages or be able to make your event consumers idempotent to provide an illusion of exactly-once.

Sagas require careful consideration to make sure you can provide one of these guarantees during a "commit" (the order in which you ACK a message, send resulting messages, and record your own state -- if necessary) as these operations are non-atomic. If you mess it up, you can end up providing the wrong level of guarantee by accident. For example:

1. fire resulting messages

2. store state (which includes ids of processed messages for idempotency)

3. ACK original event

In this case, you guarantee that you will always send results at-least-once if a crash happens between 1&2. Once we get past 2, we provide exactly-once semantics, but we can only guarantee at-least-once. If we change the order to:

1. store state

2. fire messages

3. ACK original message

We now only provide at-most-once semantics. In this case, if we crash between 1&2, when we resume, we will see that we've already handled the current message and not process it again, despite never having sent any result yet. We end up with at-most-once if we swap 1&3 as well.

So, yes, Sagas are great, but still pretty easy to mess up.