Hacker News new | ask | show | jobs
by contravariant 1588 days ago
Isn't the main selling point of airflow the bundling in the first place? Why would you want many different specialized tools to manage scheduled tasks?
3 comments

Cue the famous Jim Barksdale quote: "There’s only two ways to make money in business: One is to bundle; the other is unbundle"
I think there's two factors at play here:

1) Specialized tools reduce the amount of engineering overhead. As a business, I primarily care about time to value. If I can use specialized SaaS to get my data centralized, clean, and synced across my tools in a week, why would I want to spend months building all of these processes from scratch?

Sure, I lose control, visibility, and more... but I was able to deliver value 3 months ahead of schedule.

2) Existing tools like Airflow are highly technical to get started with. You can't just focus on building out scripted solutions. You have to set up and manage the infrastructure. You have to sift through the tool's documentation to understand how to effectively build DAGs. You have to inject your business logic with platform logic to make sure your code will run on Airflow.

Because the demand for data professionals is high and the supply is low, the technology ends up trying to offset the need for those highly technical skills in your organization.

I get what you're saying but trying to make sure your code will run on airflow is the wrong way of thinking about it IMHO. You should be trying to get airflow to make sure your code runs (could be in airflow, could be anywhere else).

A lot of the stuff we do with airflow is just basically sending commands and looking at the result (and handling any errors), this part is generic enough that you usually only need to implement it once for whatever platform your code is running on.

The tricky bit is when your DAG crosses platforms, but that's always a problem. If anything it's easier to solve when the tool scheduling tasks isn't part of the platform (note however that airflow is not a tool for solving dataflow, though some glue code in python does often work wonders).

Exactly my thoughts as well. I have one point where I can see if all the remote services that I am using are operating correctly. I don't need to connect to various other apps to figure this out.
This is why we moved to airflow vs lots of Cron jobs. Centralized place to look, logging, etc.
I ran my last startup on cron-jobs. Many times I didn't notice if the job didn't execute. That was an immediate value proposition for me.