Hacker News new | ask | show | jobs
by blakeburch 1418 days ago
I really agree with Shift 2 (“We unblock analysts” to “We enable everyone”). The problem is that Airflow (and most other OSS orchestrators) are overkill for the majority of data practitioners. They lock workflow development into Python, forcing you to mix platform logic with executional business logic. The complexity to get started building workflows is too high, infrastructure challenges always crop up, and the system itself is a black box for anyone non-technical.

> The tool data engineers need to be effective in this new world does not run scripts, it organizes systems. 100%. You'll still need to run independent scripts, but today's data challenges focus on "how do I connect the stages of data operations together". Teams need to figure out how to connect data ingestion -> data transformation -> data visualization -> alerting and reporting -> ML model deployment -> metadata + catalogs -> data augmentation -> API actions.

The larger goal of orchestration is to prevent downstream processes from running if the data being processed upstream fails. Each stage could be performed with a series of scripts, a SaaS tool, or a mix. Each team is responsible for their own stages, but they need to know how their work connects to the larger picture so when something goes wrong, there's ownership and clarity that drives a quick resolution. Unfortunately, this still doesn't exist in most organizations because the current tooling isn't solving the orchestration and visualization of connected systems super effectively. It's instead enabling one-off, disconnected data processes.

Disclaimer: I built Shipyard (www.shipyardapp.com) to address many of these concerns of simplifying the ability to connect data tools and quickly automate and action on data.