Hacker News new | ask | show | jobs
by jaz46 3396 days ago
Does the other tool you're talking about that works with Airflow allow you to scale your Airflow tasks based on, for example, the amount of new input data that needs to be processed? That was one of the major challenges we see in bioinformatics workloads. Sometimes you have a few new samples to run and other times there are thousands -- so your task scheduler, although it is centric, needs to have an understanding of the data too.
1 comments

At a high level [for Airflow specifically] the scheduler or workflow engine cares most about the tasks and their dependencies, and is somewhat agnostic about the units of work it triggers.

It's possible to use a feature called XCom as a message bus between tasks, but would typically direct people in the direction of having stateless, idempotent, "contained" units of work and avoid cross task communication as much as possible. https://airflow.incubator.apache.org/concepts.html#xcoms

For your case [which I have little input on] I think singleton DAGs described in another post on this page may work.