|
|
|
|
|
by grillorafael
2849 days ago
|
|
Apache Airflow seems like a really interesting project but I don't know anyone using that can give a real life pros/cons to it. Anyone here dares to give some feedback in that sense? Ps: Why do people still use Prezi? It gives me vertigo. |
|
I've started to use it for personal projects, and slowly probing for adoption in our shop, where applicable.
The good points I have seen
- It's simple Python, and not XML like Azkaban. I've seen people with less technical expertise build useful stuff quickly, and automate their workflows.
- Very good UI, which just lets you do what you need without fuss.
- Easy to build modular and interactive flows, with interesting stuff as sensors, communications between operators, triggers etc.
- Everything is stored into a database, which I can query about anything related to the processes run and Airflow itself
- Its source is grok-able and documented, it allows you to easily add your own modules (or "operators" as they're called)
- Many add-on modules for operators already exist from the community
- Easier to force the team to version control your process flows
Some cons, from the light use I've seen
- If you scale beyond a point, you have to take care of scaling the database as well, adding DBA work
- I've encountered some issues with scheduler and backfilled jobs, and `depends_on_past`, but it might be my limited experience
- People may start to use specific external dependencies/modules, which you will then need to keep track of
- Uses its own lingo/terminology, which you'll have to learn and use
- Uses system time, so no running in different timezones
I have high hopes for the project, as it's currently incubating for the Apache Foundation, and I hope it remains minimal and keeps the present scope.
If it seems interesting to you, my suggestion is to start small, keep in mind that it handles relations between tasks and not data, and try to automate some easy bash script that you currently handle with cron.