Hacker News new | ask | show | jobs
by burntcaramel 966 days ago
Is this the problem where you have workflows live and in progress, yet you want to update the workflow and have things not break? Would you want to have multiple versions at once, or rather some way to migrate the in progress ones to the latest workflow definition?
2 comments

The latter.

For short lived workflows, you may not care about updating; just let it finish.

For longer jobs, you want some way to replace the current logic and either resume from where the job left off, or restart it idempotently. Especially if your workflow spans months or years (which at least some of these systems are designed for).

The challenge is these systens shine when you manage the job state in-memory, but they don't "store" the data in a traditional sense. They just replay your logic and replay the original I/O results. So if your logic changes, the replay breaks and your state goes bye-bye.

(I think of it similarly to React's "rules of hooks": you can't do anything that makes the function call key APIs in a different order than previous executions)

So you either accept that you can never update an in-flight job (in a meaningful way, at least), or you track job state in some other system and throw away the distinguishing feature of these systems.

I'm curious how people normally handle this. When I worked with Azure Durable Functions I couldn't find a way around this.

I think the problem works both ways too, because for an in progress (but long-running) workflow, you may not want to retroactively apply your new business logic for the part that's already run because that would be unexpected. But for the logic that hasn't run yet, you would certainly want the latest and greatest.

I wonder if there could be an approach where you have both versions live simultaneously, and introduce some sort of "checkpoint" into the old version that would act similar to a DB migration. When re-computing a workflow you could then start from the latest checkpoint, but any workflows that were created with the old version that haven't reached a checkpoint would continue to run the old code until it does.

You’d likely want either option depending on the lifetime.