|
|
|
|
|
by tsanikgr
2565 days ago
|
|
We experienced a big hit on our productivity when we were using airflow, as there is significant overhead when running pipelines. We think this is easier than airflow and needs less setup: - You don't need a scheduler, neither a db, nor any initial setup. On the contrary, kedro provides the `kedro new` command which will create a project for you that runs out of the box (optionally with a small pipeline example).
- You can run your pipelines as simple python applications, making it easy to iterate in IDEs or terminals
- Tasks are simple python functions, instead of operators
- Datasets are first level citizens. You don't need to explicitly define dependencies between the tasks: they are resolved according to what each task produces/consumes
We also think that a big differentiating factor is the `DataCatalog`. Being able to define in YAML files where your data is and how it is stored/loaded means that the same code will run in any environment given the appropriate configuration files.This makes testing & moving from development to production much easier. (Disclaimer - I am one of the lead developers of kedro) We hope that you give it a try and give us feedback :) |
|