|
|
|
|
|
by lorendsr
1111 days ago
|
|
At some point, if you can't automatically fix something, you have to stop and report to a human for manual intervention/repair. While a saga doesn't guarantee that you avoid manual repair, it significantly reduces the need for it. If each of these has a 1% chance of non-retryable failure: Step1 Step2 Step1Undo then this has a 1% chance of needing manual repair (it's okay if step1 fails, but if step1 succeeds and step2 fails, we need to repair): do Step1 do Step2 and this has a .01% chance (we only repair if Step2 and Step1Undo fails, 1% * 1%): do Step1 try { do Step2
} catch { do Step1Undo
}
|
|
In case Step1's service doesn't expose an API to poll its status, then the only recourse is to execute it again (with the same input key, assuming it's idempotent ;)