|
|
|
|
|
by RangerScience
733 days ago
|
|
Helpful comment! If I could pick your brain... I'm looking at a green field implementation of a task system, for human tasks - people need to do a thing, and then mark that they've done it, and that "unlocks" subsequent human tasks, and near as I can tell the overall task flow is a DAG. I'm currently considering how (if?) to allow for complex logic about things like which tasks are present in the overall DAG - things like skipping a node based on some criteria (which, it occurs to me in typing this up, can benefit from your above advice, as that can just be a configured function call that returns skip/no-skip) - and, well... thoughts? (: |
|
However, if each node is more of a data check/wait (e.g. we’re on this step until you tell me you completed some task in the real world), then it would seem rather than your DAG orchestrating nodes of logic, the DAG itself is the logic. In this case, i think you have a few options, though Airflow itself is probably not something I would recommend for such a system.
In the case of the latter, there are a lot of specifics to consider in how it’s used. Is this a universal task list, where there is exactly one run of this DAG (e.g. tracking tasks at a company level), or would you have many independent runs of this (e.g. many users use it), are runs of it regularly scheduled (e.g. users run it daily, or as needed).
Without knowing a ton about your specifics, a pattern I might consider could be isolating your logic from your state, such that you have your logical DAG code, baked into a library of reusable components (a la the above), and then allowing those to accept configuration/state inputs that allow them to route logic appropriately. As a task is completed, update your database with the state as it relates to the world, not its place in the DAG. This will keep your state isolated from the logic of the DAG itself, which may or may not be desirable, depending on your objectives and design parameters.