| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by blakeburch 1487 days ago

Love your observation about tying the workflow to Airflow.

One of my biggest annoyances in the orchestration space is that teams are mixing business logic with platform logic, while still touting "lack of vendor lock-in" because it's open source. At the point that you're importing Airflow specific operators into your script and changing the underlying code to make sure it works for the platform (XCom, task decorators, etc.), you are directly locking yourself in and making edits down the road even more difficult.

While some of the other players do a better job, their method of "code as workflow" still results in the same problems, where workflows get built as a "mega-script" instead of as modular components.

I'm a co-founder at Shipyard, a light-weight hosted orchestrator for data teams. One of our core principles is "Your code should run the same locally as it does on our platform". That means 0 changes to your code.

You can define the workflow in a drag and drop editor or with YAML. Each task is it's own independent script. At runtime, we automatically containerize each task and spin up ephemeral file storage for the workflow, letting you can run scripts one after the other, each in their own virtual environment, while still sharing generated files as if you were running them on your local machine. In practice, that means that individual tasks can be updated (in app or through GitHub sync) without having to touch the entire workflow.

I'm biased, but it seems crazy to me that so many engineers are willing to spend hours fighting the configuration of their orchestration platform rather than focusing on the solving the problems at hand with code.