Hacker News new | ask | show | jobs
by dimberman 1963 days ago
Hi @theptip,

I'm an Airflow PMC and would love to know a bit more about your comparison :).

1. Have you tried Airflow 2.0? We made some pretty big overhauls both in terms of UI and backend. 2. DAG versioning is currently problematic, but DAG versioning is a "when" and not an "if" so should be in a future 2.x version :). That said could you describe a bit more about your deployment issues? User stories like this help us improve the product. 3. Have you looked into using KEDA with the CeleryExecutor? You could create KEDA queues for a lot of commonly used workflows and then you'd only need to use the python or bash operator to run those tasks instead of k8spodop. 4. Are you using the Airflow helm chart or did you custom roll a deployment?

Any feedback would be highly appreciated and I'm also glad to answer any questions you might have!

1 comments

Thanks for getting in touch, happy to share.

1. We ended up using GCP's hosted Composer to get started more quickly, which doesn't seem to have been updated to Airflow 2.0 yet. I'll put that on the list for evaluation.

2. A few usecases that I immediately hit complexity walls on:

A) Having a "staging" version of our pipelines so that we don't break the prod ETL; it was really difficult to find a canonical method for having common DAG code that's parameterizable per env. The fact that all of the DAGs live side-by-side in the same directory means I have to run the same job for a "prod push" as a "staging push" (i.e. if I get the staging deploy wrong I could break prod). Given that we deploy version vN+1 to staging, check it's working, and only then deploy vN+1 to prod, we ended up with some weird config injection code to let us have two folders containing copies of the same DAG scripts with different config. This just felt janky.

B) Managing Python dependencies between different apps was also painful; for example we wanted to add Meltano, and so that app brings in a bunch of deps, which broke our main dags when I naively updated the main python pip env to install the new meltano requirement. Using the K8s operator lets us effectively have a venv per dag but the pattern of using one python env across the whole Airflow install bit me very early on and seemed pretty unscalable.

3. I haven't looked at KEDA, I'll take a look.

4. We're using GCP Composer for now, though I looked at the Helm chart too.