|
|
|
|
|
by the_af
1588 days ago
|
|
As a newcomer to the world of data, I have no strong opinions about Airflow. It replaced a bunch of disparate cron jobs, so it's definitely better than what was there before. There are things I like and things I don't about it. The UI is awful -- I don't know anyone that likes it, unlike what the article states. I like that it's centralized and that it's all Python code. Deploying it and fine-tuning the config for a variety of workloads can be a pain. Sometimes sensors don't work right. Tasks sometimes get evicted and killed for obscure reasons. Zombie tasks are a pain big enough you'll see plenty of requests for help online. That said, replacing it with a bunch of disparate tools again? Seems like a step backwards. And now instead of a single tool, your org has to vet, secure, understand and monitor a bunch of different tools? It's bad enough with only one... What am I missing? PS: data analysis/engineering as a field seems new and immature enough that, in my humble opinion, we should be focusing on developing good practices and theory, instead of deprecating existing (and pretty recent) tech at an ever increasing pace. |
|
Airflow is... not amazing. But by the standards of horrible enterprise software we've all been subjected to, it's not that bad.
If you're complaining about Airflow, wait for the day you're forced to use an internally built database client.
That's Afghanistan.
Our proprietary AWS wrapper takes 45 damn minutes on a good day to allocate a VM. The AMI is built in two minutes. TWO.
I'm sure in 5 years Dagster and Prefect will have improved gradually in lots of incremental ways. For now Airflow is pretty solid.