Hacker News new | ask | show | jobs
by tchiotludo 1551 days ago
I know that that some issues are fixed in Airflow 2, they have made a large improvement with that release. But not all issues is resolved with this one.

The performance issue is still here, just launch Airflow and submit thousand dagruns with simple python sleep(1) and you will hit the cpu bound very quickly with a total time that will have a large duration. Airflow is not designed for a lot of short duration tasks. When using event driving data flow, it's really complicated to managed.

Imagine a flow that will be triggered for each store for example (thousand of store, with 10+ tasks for each one), Airflow will not be able to manage this kind of workflow quickly (and it's not its goals). Airflow was clearly defined to handle small (hundreds tasks) for a long time.

For the XCOM part, Airflow store this in database, so you can't store data into this, you will need to store a small data (database is not here to store big files). In Kestra, we have a provide a storage that allow storing large data (Go, To, ...) between tasks natively with the pain on multiple node clusters.

1 comments

AirFlow 2 was released in 2020. You're saying you knew that these issues were fixed, and then an article is published on your webpage in 2022 knowingly comparing against the technical properties of a major version release 2 years behind? That is not a good look.
First of all, the article published is a retrospective, we are talking from decision in 2019, we can't talk from the past that leed us for a choice?

Second, not all issues, some of them are fixed but there is still major issue, just dig google about issue scaling airflow on production, even with airflow 2, it's still complicated. Airflow still use a lot of CPU for doing nothing else than waiting for some api call. Just try to run 5000 tasks that sleep (simulation of an api call) in Airflow and we will see the challenge of scaling it.

Third, Airflow have still design issues that will not allow you to deal with some sort of pipeline.

Last one, I'm not here to fight against Airflow, some people love, some people hate it. We have take a completely different choice about designing and scaling data pipeline, I let people used what they like. For me, Airflow (and other workflow manager) doesn't fit.