|
|
|
|
|
by Hippocrates
1421 days ago
|
|
I despise airflow and how cemented it is as data infrastructure. It such a useful and basic concept but a nightmare to manage, and it works like junk. It's taken me 3 separate jobs over 7 years to realize that it's probably not our fault. Everyone seems to struggle with the same things: flaky scheduler that is slow to run tasks, confusing and redundant sounding settings that apply at up to three different levels (environment, job, task). It invites less experienced users to write a sea of spaghetti code in a monolithic DAGs repo. People wind up doing heavy data munging in python operators, which clobbers scalability and reliability. It also can't handle a large number of parallel tasks or frequent runs. It seems to have miserable scalability for the resources given, and bad controls for auto scaling. The UI feels dated and unintuitive. XComs seem useful to everyone but work like crap and actually an anti-pattern. I've also tried it on Cloud Composer (google managed) and automated upgrades always trashed the cluster. It's not well designed for GKE because it writes logs to files and requires stateful sets. Testing the code is a huge burden due to the vast environment and dependencies needed to make it work locally. I'm eager to rid my life of it and test out temporal for some of the high concurrency/frequency cases we have. |
|