Hacker News new | ask | show | jobs
by nooorofe 681 days ago
Airflow is far from perfect, but I don't understand your concerns. I work in a big and messy company and even messier department. We have jobs running in Databricks, Snowflake, sometimes we read data from API end points, or even files uploaded to SharePoint (my group is not building DW). Airflow lets me organize it in a single workflow. At least I know that every failed job is reported by email and I don't need to search multiple systems - all starts from Airflow.

> Why should biz logic that just needs to run Spark and interact with S3 now need to run a web server?

Webserver is mostly UI. Scheduler service triggers the jobs.

We have groups which run everything as Bash Operator, no dependency issues that way.

You maybe have a very specific use case in mind, the main points of using Airflow for me

* Single orchestration center: manual job control (stop, pause, rerun), backfill; automated scheduler/retry; built-in notification

* Framework built around "reporting period" - it enforces correct abstraction, if a data batch is broken, I can rerun it and rerun all dependent downstream. How do you fix data in event driven workflow?

* managing dependencies

In most cases all Airflow does is running your job with passing it "date" parameter. You can test your code without Airflow - just pass it a date and run from command line.