Hacker News new | ask | show | jobs
by zukzuk 1588 days ago
Everyone's context is different, but I've found the exact opposite to be true. Airflow is simple and dumb enough that it can be easily understood and managed by a small team, but it's also flexible and powerful enough that we can't come up with a good enough reason to switch to anything else.*

*We are, however, becoming more and more reliant on dbt, and the article makes a good point about Airflow providing no visibility for what's going on in a dbt node. So we're ending up with an increasingly simpler Airflow dag, with most of the complexity hidden inside a single dbt node.

2 comments

This reflects how I often deploy Airflow as well (usually on GCP as Composer)

We use DBT to manage the DAG for the BQ transformations, put this in a container and deploy it into the kubernetes cluster that airflow is running on as a single node.

Airflow can then handle the scheduling and DAG nodes for non DWH dependencies such as loading/checking for files, kicking off tasks that need to run after the DWH refresh and the like.

I find once it is set up it is extremely easy for small teams to follow the pattern, and the single view of all the pipelines running is a great benefit - as well as handling the logic around last successful runs etc., that would need to be implemented manually if using simple cron jobs.

I'm not too familiar with the use of dbt but what was the reason you chose to have a single dbt node rather than translating the dependencies into an airflow dag?