Hacker News new | ask | show | jobs
by FridgeSeal 2135 days ago
All that I've heard about Airflow is that it's intricately coupled to Python, and can be finicky to maintain as it contains a fair few moving parts.

I've previously used Argo Worfklows, which I prefer because I already have a Kubernetes environment and because it runs containers, it's totally language-agnostic which I think is a huge benefit. It also has a huge number of features for defining and controlling the workflows. Downside is that it's configuration/definition YAML's can get large and a bit messy (as YAMLs are want to do) - however, templated workflows are coming soon, which should hopefully reduce the noise.

Personally I'm reticent to use anything that re-implements its own full scheduling system instead of hooking into a pre-existing (and probably more bulletproof) one (i.e. K8s scheduler), and anything that _requires_ me to write all of my ETL/schedulable code in Python.

1 comments

Argo implements its own scheduler AFAIK, otherwise how would it manage dependencies and the execution graph. The part that Argo is using K8S for is orchestration, which Airflow can do as well with the KubernetesPodOperator, but it's not a cloud native solution and it spins the whole scheduler and backend for each task.

We tried to make Polyaxon[0] work with Airflow for Machine Learning specific workflows, but it was very painful and it does not have a good state/artifacts management, which leaves the users tweaking around. We end up making a simple abstraction on top K8S, much easier, to provide features for parallel executions, dependencies, failure handling, retries, ... as well as handling ML specific graphs such as hyperparameter tuning and distributed scheduling.

[0]: https://github.com/polyaxon/polyaxon

Oh interesting, TIL.

By the way, Polyaxon looks awesome, I’ve been wanting to try it for a while, but just don’t have any machine learning projects in the pipeline at the moment alas.