| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by theptip 1962 days ago

Thanks for getting in touch, happy to share.

1. We ended up using GCP's hosted Composer to get started more quickly, which doesn't seem to have been updated to Airflow 2.0 yet. I'll put that on the list for evaluation.

2. A few usecases that I immediately hit complexity walls on:

A) Having a "staging" version of our pipelines so that we don't break the prod ETL; it was really difficult to find a canonical method for having common DAG code that's parameterizable per env. The fact that all of the DAGs live side-by-side in the same directory means I have to run the same job for a "prod push" as a "staging push" (i.e. if I get the staging deploy wrong I could break prod). Given that we deploy version vN+1 to staging, check it's working, and only then deploy vN+1 to prod, we ended up with some weird config injection code to let us have two folders containing copies of the same DAG scripts with different config. This just felt janky.

B) Managing Python dependencies between different apps was also painful; for example we wanted to add Meltano, and so that app brings in a bunch of deps, which broke our main dags when I naively updated the main python pip env to install the new meltano requirement. Using the K8s operator lets us effectively have a venv per dag but the pattern of using one python env across the whole Airflow install bit me very early on and seemed pretty unscalable.

3. I haven't looked at KEDA, I'll take a look.

4. We're using GCP Composer for now, though I looked at the Helm chart too.