Hacker News new | ask | show | jobs
by shipp02 501 days ago
This seems like temporal only without as much server and complexity. Maybe they ignore it or it really is that simple.

Overall really cool! There are some scalability concerns that are brought that I think are valid but maybe you have a Postgres server backing up every few servers that need this kind of execution. Also, every function shouldn't be its own step but needs to be divided into larger chunks where every request only generates <10 steps.

2 comments

The example is overly simplified. It glosses over many of the subtle-but-important aspects of durable execution.

For example:

- Steps should be small but fallible operations - eg. sending a request to an external service. You generally want to tailor the retry logic on steps to the specific task they are doing. Doing too much in a step can increase failure rates or cause other problems due to the at-least-once behaviour of steps.

- The article makes a big deal of "Hello" being printed 5 times in the event of a crash, but durable execution doesn't guarantee this! You can never have exactly-once guarantees of side-effectful functions like this without cooperation from the other side. For example, if the external service supports idempotency via request IDs then you can generate an ID in a separate step and then use that in your request to get exactly-once behaviour. However, most services don't offer this. Crashes during a step will cause the step to re-run, so durable execution only gives you at least once behaviour.

- Triggering the workflow itself is a point of failure. In the example, the workflow decorator generates an ID for the workflow internally, but for triggering a workflow exactly once the workflow ID needs to be externally generated.

- The solution is light-weight in terms of infrastructure, but not at all lightweight in terms of performance.

This is a great answer, and yes, those are critical aspects of durable execution. Maybe I should write a follow-on post that goes into more detail...
I'd love to read it. Getting exactly once semantics is quite an interesting topic.
Thanks! DBOS is simpler not because it ignores complexity, but because it uses Postgres to deal with complexity. And Postgres is a very powerful tool for building reliable systems!
Temporal has the option of using postgres as the persistence backend. Presumably, the simplicity of DBOS comes from not having to spin up a webserver and workflow engine to orchestrate the functions?
Have you had scalability issues because your tables got too big?

Is there a mechanism to GC workflows that are completed?

Tables getting too big hasn't been a concern in practice because information on completed workflows can easily be GC'ed.