|
|
|
|
|
by rockostrich
740 days ago
|
|
My org solved this problem for our use case (handling travel booking) by versioning workflow runs. Most of our runs are very shortlived but there are cases where we have a run that lasts for days because of some long running polling process e.g. waiting on a human to perform some kind of action. If we deploy a new version of the workflow, we just keep around the existing deployed version until all of its in-flight runs are completed. Usually this can be done within a few minutes but sometimes we need to wait days. We don't actually tie service releases 1:1 with the workflow versions just in case we need a hotfix for a given workflow version, but the general pattern has worked very well for our use cases. |
|
The only caveat being that we generally recommend that you keep it to just a few minutes, and use delayed calls and our state primitives to have effects that span longer than that. Eg, to poll repeatedly a handler can delayed-call itself over and over, and to wait for a human, we have awakeables (https://docs.restate.dev/develop/ts/awakeables/)
More discussion: https://restate.dev/blog/code-that-sleeps-for-a-month/