Hacker News new | ask | show | jobs
by glogla 2855 days ago
We tested it, but the performancd was bad. We needed hundred workflows with few hundred taks each, and Airflow would just topple over daily.

We ended up with proprietary tool from Teradata thats basically Airflow written in perl - but it can handle all the work.

Other than scalability, Airflow is pretty nice.

5 comments

[full disclosure, I'm the creator of Airflow]

Many environments run tens of thousands of concurrent tasks, and hundreds of thousands of tasks daily. The list of companies using Airflow speaks for itself https://github.com/apache/incubator-superset#who-uses-apache...

But hey, it's like anything, you have to do a bit of work to get distributed systems to run at scale. There are now hosted solutions to help with that (Google Cloud Composer and Astronomer.io)

I run an Airflow instance that does millions of tasks per month across dozens of DAGs. There's some performance tuning involved in the configuration file and of course you need the underlying resources available but Airflow has scaled to this level well for us.

If you are able to reproduce and can post to the dev mailing list, we are happy to help... especially so if it gets you off of a proprietary tool written in Perl ;).

1.10 was just released and adds a ton of commits. I'd really encourage you to give Airflow another shot if you have the time.

Millions per MONTH is just ridiculous. Stay with what works, revisiting this slow Python tool will be a cost and time sink. A mistake companies let engineers with millennial complex do all too often.
It sounds weird that you had these problems. Silly question, but might it have been a database optimization issue?
My guess is that you had an underpowered database instance backing Airflow.
Perl saved the day again.