Hacker News new | ask | show | jobs
by jillesvangurp 2131 days ago
I can confirm all of this. I was involved with setting up airflow recently and we had a rather rough time because it is kind of a half assed solution. It's basically a framework that allows you to do stuff with a lot of plugins/connectors that may or may not be useful for you with a rather large variation in completeness, bugginess, documentation, and utility. A lot of it is kind of sketchy or even actively harmful but there are definitely some useful things as well.

It does not help that the entirety of the documentation is written from the point of view of people who are definitely not of the devops variety doing things manually on their laptop. I.e. all the wrong things you should never do in a production setup. Configuring this thing for production usage is largely undocumented, non trivial, and you'll be piecing things together from stackoverflow and various third party github repositories for e.g. using docker, terraform, etc. rather than the official documentation which merely hints at these things being possibilities.

It also does not help that the internals are kind of buggy and wonky. We had a really hard time getting the basic plumbing for running workers, queues, etc. working properly. It would constantly grind to a halt and stop processing stuff. Also there's this minutes long uncertainty principle "is it actually running or still figuring out that it needs to catch up?!".

Also, the UI/UX is terrible IMHO. Think hitting cmd+r a lot because page refreshes are not a thing in Airflow and absolutely everything requires dealing with multiple clicks to navigate complex dialogs (modal, naturally). So, unless you just manually reloaded the page: you are looking at stale information. Jobs that have long finished. Green statuses that have turned red, etc. Even Jenkins/Hudson had auto reload 15 years ago. And given the significant overlap in functionality, you might actually be better off using that if all you need is the ability to run some simple job at specific intervals.

The only valid reason for using Airflow is the ecosystem of plugins. It's valid and it's basically the same reason that people tolerated the craptastic experience that was managing Nagios back in the day. Horribly complicated to setup, terrible/primitive UI, loads of performance issues, non trivial failure modes, etc. but world + dog used it and there were nagios plugins for just about everything. I've been that rabbit hole as well and I'd say the experience is similar enough.

So, definitely use it in hosted form if you can or avoid altogether unless you really need it.