Hacker News new | ask | show | jobs
by dopamean 1484 days ago
If you could go back and use something else instead what would you choose?
2 comments

It's a good question. I believe airflow was probably the right choice at the time we started. We were a small team, and deploying airflow was a major shortcut that more or less handled orchestration so we could focus on other problems. With the aid of hindsight, we would have been better off spinning off our own scheduler some time in the first year of the project. Like I mentioned in my OP, we have a set of well-defined workflows that are just templatized for different jobs. A custom-built orchestration system that could perform those steps in sequence and trigger downstream workflows would not be that complicated. But this is how software engineering goes, sometimes you take on tech debt and it can be hard to know when it's time to pay it off. We did eventually get to a stable steady state, but with lots of hair pulling along the way.
dbt tool. getdbt.com
Can dbt run arbitrary code? If it can, it's not well advertised in the documentation. Every time I've looked into dbt, I found that it's mostly a scheduled SQL runner.

The primary reason we run Airflow is because it can execute Python code natively, or other programs via Bash. It's very rare that a DAG I write is entirely SQL-based.

dbt has just opened a serious conversation about supporting Python models. I'm sure they'd value your viewpoint! https://github.com/dbt-labs/dbt-core/discussions/5261
You’re right. I think the strength of dbt is in the T part of ELT. I wrote ELT to make a distinction in principle from the traditional ETL. (E)xtract and (L)oad is the data ingestion phase that would probably be better served by Dagster, where you could use Python.

(T)transform is decoupled and would be served in set-based operations managed by dbt.

Dbt is great, but solves only a small part of what Airflow does.