Hacker News new | ask | show | jobs
by blakeburch 1587 days ago
I think there's two factors at play here:

1) Specialized tools reduce the amount of engineering overhead. As a business, I primarily care about time to value. If I can use specialized SaaS to get my data centralized, clean, and synced across my tools in a week, why would I want to spend months building all of these processes from scratch?

Sure, I lose control, visibility, and more... but I was able to deliver value 3 months ahead of schedule.

2) Existing tools like Airflow are highly technical to get started with. You can't just focus on building out scripted solutions. You have to set up and manage the infrastructure. You have to sift through the tool's documentation to understand how to effectively build DAGs. You have to inject your business logic with platform logic to make sure your code will run on Airflow.

Because the demand for data professionals is high and the supply is low, the technology ends up trying to offset the need for those highly technical skills in your organization.

1 comments

I get what you're saying but trying to make sure your code will run on airflow is the wrong way of thinking about it IMHO. You should be trying to get airflow to make sure your code runs (could be in airflow, could be anywhere else).

A lot of the stuff we do with airflow is just basically sending commands and looking at the result (and handling any errors), this part is generic enough that you usually only need to implement it once for whatever platform your code is running on.

The tricky bit is when your DAG crosses platforms, but that's always a problem. If anything it's easier to solve when the tool scheduling tasks isn't part of the platform (note however that airflow is not a tool for solving dataflow, though some glue code in python does often work wonders).