|
|
|
|
|
by blakeburch
1588 days ago
|
|
I think you're 100% right that the tasks that can be accomplished in Airflow are currently being unbundled by tools in the modern data stack, but that doesn't erase the need for tools like Airflow. Sure, you can now write less code to load your data, transform it, and send it out to other tools. As the unbundling occurs, the end result is more fragmentation and fragility in how teams manage their data. Data teams I talk to can't turn to any single location to see every touchpoint their data goes through. They're relying on each tool's independent scheduling system and hoping that everything runs at the right time without errors. If something breaks, bad data gets deployed and it becomes a mad scramble to verify which tool caused the error and which reports/dashboards/ML models/etc. were impacted downstream. While these unbundled tools can get you 90% of the way to your desired end goal, you'll inevitably face a situation where your use case or SaaS tool is unsupported. In every situation like this I've ever faced, the team ultimately ends up writing and managing their own custom scripts to account for this situation. Now you have your unbundled tool + your custom script. Why not just manage all of the tools and your scripts from a singular source in the first place? While unbundling is the reality, this new era of data technology will always still have a need for data orchestration tools that serve as a centralized view into your data workflows, whether that's Airflow or any of the new players in the space. (Disclosure: I'm a co-founder of https://www.shipyardapp.com/, building better data orchestration for modern data teams) |
|
No amount of tooling will make data transformation a painless process; all you end up doing is burying the business logic under so many layers of abstraction that it becomes impossible for anyone to understand.