|
|
|
|
|
by atomicity
2132 days ago
|
|
Nothing is more popular yet, but there are better
architected options out there. It's hit 17k GitHub
stars and was used by the team I was previously at.
I don't think anything will beat it unless something from the CI/CD or "cloud native" world moves in unexpectedly. The operators and scalability are somewhat useful.
I was happy with the UI compared to cron. Testing
is a mess. Also, Airflow isn't CI/CD-friendly (but
it's possible to get it to work). I'd recommend a managed option unless you have
a skilled ops team. It reminds me of Hadoop in terms
of how exciting it is to get set up, which isn't a good
thing. |
|
It does not help that the entirety of the documentation is written from the point of view of people who are definitely not of the devops variety doing things manually on their laptop. I.e. all the wrong things you should never do in a production setup. Configuring this thing for production usage is largely undocumented, non trivial, and you'll be piecing things together from stackoverflow and various third party github repositories for e.g. using docker, terraform, etc. rather than the official documentation which merely hints at these things being possibilities.
It also does not help that the internals are kind of buggy and wonky. We had a really hard time getting the basic plumbing for running workers, queues, etc. working properly. It would constantly grind to a halt and stop processing stuff. Also there's this minutes long uncertainty principle "is it actually running or still figuring out that it needs to catch up?!".
Also, the UI/UX is terrible IMHO. Think hitting cmd+r a lot because page refreshes are not a thing in Airflow and absolutely everything requires dealing with multiple clicks to navigate complex dialogs (modal, naturally). So, unless you just manually reloaded the page: you are looking at stale information. Jobs that have long finished. Green statuses that have turned red, etc. Even Jenkins/Hudson had auto reload 15 years ago. And given the significant overlap in functionality, you might actually be better off using that if all you need is the ability to run some simple job at specific intervals.
The only valid reason for using Airflow is the ecosystem of plugins. It's valid and it's basically the same reason that people tolerated the craptastic experience that was managing Nagios back in the day. Horribly complicated to setup, terrible/primitive UI, loads of performance issues, non trivial failure modes, etc. but world + dog used it and there were nagios plugins for just about everything. I've been that rabbit hole as well and I'd say the experience is similar enough.
So, definitely use it in hosted form if you can or avoid altogether unless you really need it.