|
|
|
|
|
by FridgeSeal
2135 days ago
|
|
All that I've heard about Airflow is that it's intricately coupled to Python, and can be finicky to maintain as it contains a fair few moving parts. I've previously used Argo Worfklows, which I prefer because I already have a Kubernetes environment and because it runs containers, it's totally language-agnostic which I think is a huge benefit. It also has a huge number of features for defining and controlling the workflows. Downside is that it's configuration/definition YAML's can get large and a bit messy (as YAMLs are want to do) - however, templated workflows are coming soon, which should hopefully reduce the noise. Personally I'm reticent to use anything that re-implements its own full scheduling system instead of hooking into a pre-existing (and probably more bulletproof) one (i.e. K8s scheduler), and anything that _requires_ me to write all of my ETL/schedulable code in Python. |
|
We tried to make Polyaxon[0] work with Airflow for Machine Learning specific workflows, but it was very painful and it does not have a good state/artifacts management, which leaves the users tweaking around. We end up making a simple abstraction on top K8S, much easier, to provide features for parallel executions, dependencies, failure handling, retries, ... as well as handling ML specific graphs such as hyperparameter tuning and distributed scheduling.
[0]: https://github.com/polyaxon/polyaxon