|
|
|
|
|
by tchiotludo
1551 days ago
|
|
I know that that some issues are fixed in Airflow 2, they have made a large improvement with that release. But not all issues is resolved with this one. The performance issue is still here, just launch Airflow and submit thousand dagruns with simple python sleep(1) and you will hit the cpu bound very quickly with a total time that will have a large duration. Airflow is not designed for a lot of short duration tasks. When using event driving data flow, it's really complicated to managed. Imagine a flow that will be triggered for each store for example (thousand of store, with 10+ tasks for each one), Airflow will not be able to manage this kind of workflow quickly (and it's not its goals). Airflow was clearly defined to handle small (hundreds tasks) for a long time. For the XCOM part, Airflow store this in database, so you can't store data into this, you will need to store a small data (database is not here to store big files). In Kestra, we have a provide a storage that allow storing large data (Go, To, ...) between tasks natively with the pain on multiple node clusters. |
|