|
|
|
|
|
by emef
1484 days ago
|
|
We've also been running airflow for the past 2-3 years at a similar scale (~5000 dags, 100k+ task executions daily) for our data platform. We weren't aware of a great alternative when we started. Our DAGs are all config-driven which populate a few different templates (e.g. ingestion = ingest > validate > publish > scrub PII > publish) so we really don't need all the flexibility that airflow provides. We have had SO many headaches operating airflow over the years, and each time we invest in fixing the issue I feel more and more entrenched. We've hit scaling issues at the k8s level, scheduling overhead in airflow, random race conditions deep in the airflow code, etc. Considering we have a pretty simplified DAG structure, I wish we had gone with a simpler, more robust/scalable solution (even if just rolling our own scheduler) for our specific needs. Upgrades have been an absolute nightmare and so disruptive. The scalability improvements in airflow 2 were a boon for our runtimes since before we would often have 5-15 minutes of overhead between task scheduling, but man it was a bear of an upgrade. We've since tried multiple times to upgrade past the 2.0 release and hit issues every time, so we are just done with it. We'll stay at 2.0 until we eventually move off airflow altogether. I stood up a prefect deployment for a hackathon and I found that it solved a ton of the issues with airflow (sane deployment options, not the insane file-based polling that airflow does). We looked into it ~1 year ago or so, I haven't heard a lot about it lately, I wonder if anyone has had success with it at scale. |
|
Luigi doesn't force you into using a central orchestrator for executing and tracking the workflows. Tracking and updating tasks state is open functions left to the programmer to fill in.
It's probably geared for more expert programmers who work close to the metal that don't care about GUIs as much as high degrees of control and flexibility.
It's one of those frameworks where the code that is not written is sort of a killer feature in itself. But definitely not for everyone.