Hacker News new | ask | show | jobs
by lorendsr 1136 days ago
For those not familiar with workflows as code, a workflow is a method that is executed in a way that can't fail—each step the program takes is persisted, so that if execution is interrupted (the process crashes or machine loses power), execution will be continued on a new machine, from the same step, with all local/instance variables, threads, and the call stack intact. It also transparently retries network requests that fail.

So it's great for any code that you want to ensure reliably runs, but having methods that can't fail also opens up new possibilities, like you can:

- Write a method that implements a subscription, charging a card and sleeping for 30 days in a loop. The `await Workflow.DelayAsync(TimeSpan.FromDays(30))` is transparently translated into a persisted timer that will continue executing the method when it goes off, and in the meantime doesn't consume resources beyond the timer record in the database.

- Store data in variables instead of a database, because you can trust that the variables will be accurate for the duration of the method execution, and execution spans server restarts!

- Write methods that last indefinitely and model an entity, like a customer, that maintains their loyalty program points in an instance variable. (Workflows can receive RPCs called Signals and Queries for sending data to the method ("User just made a purchase for $30, so please add 300 loyalty points") and getting data out from the method ("What's the user's points total?").

- Write a saga that maintains consistency across services / data stores without manually setting up choreography or orchestration, with a simple try/catch statement. (A workflow method is like automatic orchestration.)

6 comments

What happens if you have a workflow in progress that you want to change the implementation of? Eg. If I had a workflow that was waiting for 30 days but then decided I wanted that interval to be 7 days instead?
Temporal determines a workflow is non-deterministic if upon code replay the high-level commands don't match up with what happened during the original code execution. In this case, technically it's safe to change the code this way because both still result timer commands. But what the timer was first created with on existing runs is what applies (there are reset approaches though as mentioned in other comment). However if, say, you changed the implementation do start a child workflow or activity before the timer, the commands would mismatch and you'd get a non-determinism error upon replay.

There are multiple strategies to update live workflows, see https://community.temporal.io/t/workflow-versioning-strategi.... Most often for small changes, people would use the `Workflow.Patched` API.

After you deploy the new workflow code, you can reset [1] a workflow's execution state to before the DelayAsync statement was called, and then the workflow will sleep for 7 days from now.

That doesn’t take into account the time it’s already been waiting. For when you want to do something on a schedule and be able to edit the schedule in future, there’s a Schedules feature that allows you to periodically start a workflow, like a cron but more flexible. [2] In this case, the workflow code would be simpler, like just ChargeCustomer() and SendEmailNotification().

[1] https://docs.temporal.io/cli/workflow#reset

[2] https://docs.temporal.io/workflows#schedule

Ah, so that's what the technique is called. I've seen a similar approach used for point of sale systems that basically persisted on every button press, so if one crashed you could bring up exactly the same state on a different one simply by logging in.
It is a replica of Amazon Simple Workflow as far as I can see. Which is not a bad thing, SWF is great and does not get much attention.
That's no coincidence, Temporal is founded by the creators of Amazon Simple Workflow. See https://temporal.io/about.
What's the difference between workflows as code and using an eventbus? Or is the same? With RabbitMQ if a machine dies whilst processing message it will automatically requeue that message such that another consumer can process it.
The major differences between an event bus and a workflow engine is the workflow engine:

- Creates and updates events/messages for you

- Maintains consistency between the events and timers and the state of the processing flow. More info on why this is important for correctness/resilience: https://youtu.be/t524U9CixZ0

The difference between workflow code and using an event bus is that with the former, the above is done automatically for you, and in the latter, it's done manually, which can be a lot of code to write and get right, and is harder to track/visualize what happened in production and debug. It would also take a lot of events to get an equivalent degree of reliability—the message processor would need to do a single step and then write the next event to the bus. So a 10-line workflow-as-code function would translate to 10 different events in the bus route.

Also, the event bus route doesn't have the new possibilities I listed in the parent comment.

Seems like this would depend on some storage guarantees, but I can’t find anything about that
Correct, the workflow's guarantee to always complete executing independent of process/hardware failures is dependent on the database not losing data. You host your workflow code with Temporal's Worker library, which talks to an instance of the Temporal Server [1], which is an open-source set of services (hosted by you or by Temporal Cloud), backed by Cassandra, MySQL, or Postgres. [2] So for instance increasing Cassandra's replication factor increases your resilience to disk failure.

[1] https://github.com/temporalio/temporal

[2] https://docs.temporal.io/clusters#persistence

Workflows, that is generally speaking the "boxes with arrows between them" pseudo-state machine, really a turing machine, are an interesting thing historically in the "enterprise computing" realm.

Sure you can claim this is a fundamentally lower level mechanism, but really it's the same thing.

It is probably important, like all enterprise software quagmires, to consider how they are sold.

Your typical IT manager with low code skills has all his documented high level processes his group manages. Hey look, Visio diagrams with point to point flows of boxes with arrows, and ... maybe ... some interactions of state with databases and/or systems domains.

They know just enough that the actual nuts and bolts is buried in all this ... code. Actual unintelligible code to them, even if they have some inkling of coding.

To them, it's simple flow, and his PHBs above him can understand what he does with the simple flows.

Then some workflow vendor walks in the door, pops up some visual editor, and wows him with basically "you don't need all that code, you put it into the workflow tool and it will just work, and BOOM your coding environment IS your simple visio document".

WOW SIGN ME UP TAKE $$$$$$$! Then comes the pilot flows.

Error handling and retries?

Distributed State and even worse, Transactional update of Distributed state?

Load distribution?

Branching, looping?

Systems integration?

It goes back to the theory of computation. State machines have limitations in processing. The stack machine addresses some of that, but it eventually runs out, requiring ... the turing machine.

As it turns out, almost all enterprise data flows or processes require turing machines. That's why they are coded at some level by turning complete languages.

Superficially at a high level, you start to see a basic state machine model on top of that ... but it is an illusion.

You move the turing machine into the workflow engine (and the workflow engine IS a turing machine ... they all have them: state, looping, branching) and the "simple point to point" flow becomes spaghetti ... tool-locked in spaghetti, with fixed limits on ability to do things.

The current evolution to workflows is the "directed acyclic graph" workflow engine. This has been an improvement, mostly by constraining the actual use of workflow engines to task organizations that they can do, and trying to keep people from going "full Turing" in the workflow engine.

It still can loop ... most do it by recursive calls to subflows ... gets pretty spaghetti. And you still have the fundamental issue that all PHBs will want in the workflow. On error, retry, or have a recovery flow, or that type. Still a huge amount of complexity to properly get the workflow working.

And yet the visual editing workflow tool can have enormous value. Enormous. The Visual nature of the flow, ability to visually diagnose suspended / failed executions. And workflow are everywhere: batch processes, code builds, deployments, automated maintenance, backups / restores, etc.

And I haven't even gotten into the mess of automated rules-engine-based stuff.

The only value structuring low level code along the lines of what "enterprise workflow" has evolved into after decades (useful, but not a holy grail) is if it gives you a fundamentally better way to visualize the execution of the code, which can happen under constrained use of workflow engines.

UML was a massive disaster, another tangential relation to what appears to be being done here. There your "workflows" or code diagrams were code generated to code.

Alas, the final problem of workflow engines is their balkanization. XML standardizations (BPEL) failed miserably for all the usual corporate product standardizations (functioned as lockin for the existing players, lowest common denominator abilities, ugly, XKCD protocol+1).

If only... if only there was a good designed representation scheme and a wide variety of good open source visualization and execution engines. But there aren't.

I think what is discussed here is a step towards a potential solution: it comes from the IDE tooling, something that a workflow always was (in the vein of the now-defunct CASE/Computer Aided Software Engineering days). A standard tooling that coders demand and IDEs provide as a minimum barrier. But IDEs are single machine things, and workflows are distributed entities ... sigh, nevermind that thought.

Ok, maybe we just need a good visualization tool first that is more universal. Don't care about the creation, just something that can "plug in" and represent non-workflow system interactions AS workflows. "Enterprise execution visualization". A REALLY good system for that has never existed IMO, and is universally needed.

Temporal (and similar systems like Cadence, AWS SWF, Azure Durable Functions) allows you more expressiveness and better DX than defining DAGs in a UI or markup file. You can write (almost) arbitrary code, and the library translates the code's actions into workflow steps.

> The Visual nature of the flow, ability to visually diagnose suspended / failed executions.

Temporal has a web UI in which you can see which executions are failing, and see on which step they're failing:

https://temporal.io/blog/temporal-ui-beta

A lot of the things - especially the good visualization you are referring is already possible with Netflix Conductor. It's OSS and free and also has a company called Orkes backing it.

I think they also have C# SDK which I can't vouch for because I haven't used it.