Hacker News new | ask | show | jobs
by zibarn 1423 days ago
Not experienced here but as a genuine interest can you tell what problems airflow solves that can't be handled by celery and rabbitmq?
4 comments

I have not used celery + rabbitmq but I assume that combo is like sidekiq + redis, or any other job queue + worker system.

Airflow packages those things together and adds some additional features - UI with Graph, gantt, logs and other views of the workflow - Users and permissions - Places to store config - Mechanisms for passing small data between tasks - Various "sensors" for triggering workflows - Various operators that interact with common data-oriented systems (bigquery, snowflake, s3, you name it). These are basically libraries that expose a config-forward API.

Probably the main selling point is the pre-made operators, but in short it is a complete solution with bells and whistles that aligns itself with the data ecosystem.

An analogy is "can you tell what problems Django solves that can't be handled by wsgi and psycopg?" Nothing fundamentally different, but life is a whole lot easier with Django. Honestly if you're doing data engineering and you haven't spent time with a good DAG runner, you're doing yourself a real disservice.

My sibling comment did a good job explaining, but the UI + configurable storage + configurable triggers all out of the box make life a lot easier.

Django is easier when you want to do things only the "Django" way. However once you need something done differently it quickly shows its truly rigid and brittle self, and you'll find yourself fighting a great and challenging battle.
Perhaps unwittingly, you've hit upon people's exact frustrations with Airflow! :)
Expressing the problem you are trying to solve as a DAG is idiomatic in Airflow, but expressing your problem in terms of queue processing is idiomatic in Celery.

    a b c d vs. a (bc) d
They make different design decisions about what to surface via UX and what to make easy as a consequence of thinking of the problems in terms of different data structures.
Airflow with a celery backend is a pretty sweet combination. In that instance, airflow just gives you a nice scheduler to manage all the celery jobs.