| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by unoti 563 days ago

Neat ideas here. I've listed 3 thoughts/concerns:

    1. Deadlocks
    2. Programmer Experience
    3. Updating the code with in-flight tasks

To avoid deadlocks, it seems like the executor should need to know what the dependencies are before attempting execute tasks. If we have 4 threads of execution, it seems like we could get into a state where all 4 of our threads are blocking on semaphores waiting for another task to provide a dependent value. And at scale, if it can happen, it definitely will happen eventually.

Potentially related-- it could make sense for the engine to give preference to completing partially completed tasks before starting new fresh tasks.

Also, I wonder if there's a way to lay the specification of tasks out so it looks more like normal code-- potentially with async await. What I mean by normal code is this: It's much more natural to write a program with 2 steps like this:

   name = await ask_for_name()
   await greet(name)

Than to redo that in a task way like this

    def ask_for_name():
        ...
        return TaskResult(next_task=greet(name))

    def greet():
        ...

If I have 7 steps with some loops and conditionals, it becomes much more difficult to grok the structure with the disjointed task structure, and much more difficult to restructure and reorganize it. I wonder if there's a way to pull it off where the distributed task structure is still there but the code feels more natural. Using this framework we're essentially writing the code in a fragmented way that's challenging to reorganize and reason about.

What will happen when we change the task code and deploy a new version; I wonder what happens to the tasks that were in flight at the moment we deploy?

1 comments

skull8888888 563 days ago

Thank you for your comment and wanted to add some clarifications.

1. tasks are not explicitly called from another task In your example greet() is never called, instead task with id=greet will be pushed to the queue

2. The reason I opted for distributed task approach is precisely to eliminate await task_1 await task_2 ...

Going to the point 1, task_2 just says to the engine, ok buddy, now it is time to spawn task_2. With that semantics we isolate tasks and don't deal with the outer tasks which calls another tasks. Also, parallel task execution is extremely simply with that approach.

3. Deadlocks will happen iff you will wait for the data that is never assigned, which is expected. Otherwise, with the design of state and engine itself, they will never happen.

https://github.com/lmnr-ai/flow/blob/main/src/lmnr_flow/stat...

https://github.com/lmnr-ai/flow/blob/main/src/lmnr_flow/flow...

4. For your last point, I would argue the opposite is true, it's actually much harder to maintain and add new changes when you hardcode everything, hence why this project exists in the first place.

5. Regarding deployment. Flow is not a temporal-like (yet), everything is in-memory and but I will def look into how to make it more robust