Hacker News new | ask | show | jobs
by EQVEYWDCHQ 1962 days ago
Why would I use this over airflow?
3 comments

Here's a conference talk that gives a detailed side-by-side comparison: https://youtu.be/oXPgX7G_eow

I'm strongly considering moving my fairly immature Airflow pipeline to Argo Workflows because:

* the Airflow DAG deploy/versioning is surprisingly primitive. The best option here seems to be to use the KubernetesOperator to version your steps, and if you're using k8s to execute, why not use it for the rest?

* the Airflow UI is pretty confusing to use, maybe this gets easier once you know your way around it.

* my team has k8s expertise and we don't know Airflow well yet; seems like less to learn running Argo Workflows, assuming you're already fluent in k8s.

* if you're already running k8s, it seems like you have to add fewer components to get Argo running; more duplication with Airflow-on-k8s.

On the other hand, being able to unit test / locally run your DAGs on your dev machine is a big plus for Airflow, where Argo Workflows seem to have a less strong testing story. And writing YAML is not preferable to writing Python DAG files.

Hi @theptip,

I'm an Airflow PMC and would love to know a bit more about your comparison :).

1. Have you tried Airflow 2.0? We made some pretty big overhauls both in terms of UI and backend. 2. DAG versioning is currently problematic, but DAG versioning is a "when" and not an "if" so should be in a future 2.x version :). That said could you describe a bit more about your deployment issues? User stories like this help us improve the product. 3. Have you looked into using KEDA with the CeleryExecutor? You could create KEDA queues for a lot of commonly used workflows and then you'd only need to use the python or bash operator to run those tasks instead of k8spodop. 4. Are you using the Airflow helm chart or did you custom roll a deployment?

Any feedback would be highly appreciated and I'm also glad to answer any questions you might have!

Thanks for getting in touch, happy to share.

1. We ended up using GCP's hosted Composer to get started more quickly, which doesn't seem to have been updated to Airflow 2.0 yet. I'll put that on the list for evaluation.

2. A few usecases that I immediately hit complexity walls on:

A) Having a "staging" version of our pipelines so that we don't break the prod ETL; it was really difficult to find a canonical method for having common DAG code that's parameterizable per env. The fact that all of the DAGs live side-by-side in the same directory means I have to run the same job for a "prod push" as a "staging push" (i.e. if I get the staging deploy wrong I could break prod). Given that we deploy version vN+1 to staging, check it's working, and only then deploy vN+1 to prod, we ended up with some weird config injection code to let us have two folders containing copies of the same DAG scripts with different config. This just felt janky.

B) Managing Python dependencies between different apps was also painful; for example we wanted to add Meltano, and so that app brings in a bunch of deps, which broke our main dags when I naively updated the main python pip env to install the new meltano requirement. Using the K8s operator lets us effectively have a venv per dag but the pattern of using one python env across the whole Airflow install bit me very early on and seemed pretty unscalable.

3. I haven't looked at KEDA, I'll take a look.

4. We're using GCP Composer for now, though I looked at the Helm chart too.

There's actually a python dsl that compiles to YAML: https://github.com/argoproj-labs/argo-python-dsl
If you prefer yaml > Python or if you prefer installing one k8s app instead of managing all of the airflow dependencies (scheduler, webserver, workers, etc)
and I'm sitting where wondering, why would you use Airflow over Argo Workflows?

Everyone should evaluate the options for their own needs.