|
|
|
|
|
by liampulles
13 days ago
|
|
I think the 80/20 solution for reliable workflows is: - Ensure the workflow is idempotent - if it stops or fails at any point, you should be able to start it from scratch and skip / happily redo various elements. - Store the messages which trigger workflows. - Track failures (if your log aggregation is good, even that's enough to start). Then when the odd thing fails (or sometimes a bunch of things fail, because e.g. a core integration goes down) you can lookup the messages and have a little script or tool to go and re-queue them. This is an easy starting point that can keep you going for a long time until you really approach huge scale. |
|