Hacker News new | ask | show | jobs
by kfk 2366 days ago
I mean the same thing you mean. My issue with airflow is that it’s complicated and doesn’t adapt well to cloud computing. Dask runs on aws emr and eks, Kubernetes, etc.. Unfortunately orchestration is a lot more complicated than it looks. Parallel executions, retries, logs, status tracking, email notifications. Airflow doesn’t really tackle all orchestration work.
1 comments

Airflow Maintainer here: what you are describing is exactly what Airflow takes care of (or should).

I wonder what your issue is/was? Notebooks are supported by means of a Papermill operator (equivalent to how Netflix operationalizes notebooks) or PythonOperator/BashOperator which would just wrap around your notebook.

However to parralelize tasks Airflow needs to know a bit more hence you might have found it required to break up your notebook into individual tasks that combine into a DAG. Is that what you meant?

With dask we code the workflow in the notebook and run in the notebook. We don’t have to fiddle with operator as every task is python code. Dask is easy to install which is important since each analyst has to be able to test the workflows before sending them to production. Finally by programming our own scheduler we can build the things we need. For instance we are able to listen to sql tables and api changes and trigger work based on that. Anyway I am sure I could make Airflow work too but it’s a harder fit vs dask.