| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tsanikgr 2612 days ago

We experienced a big hit on our productivity when we were using airflow, as there is significant overhead when running pipelines.

We think this is easier than airflow and needs less setup:

  - You don't need a scheduler, neither a db, nor any initial setup. On the contrary, kedro provides the `kedro new` command which will create a project for you that runs out of the box (optionally with a small pipeline example).
  - You can run your pipelines as simple python applications, making it easy to iterate in IDEs or terminals
  - Tasks are simple python functions, instead of operators
  - Datasets are first level citizens. You don't need to explicitly define dependencies between the tasks: they are resolved according to what each task produces/consumes

We also think that a big differentiating factor is the `DataCatalog`. Being able to define in YAML files where your data is and how it is stored/loaded means that the same code will run in any environment given the appropriate configuration files.

This makes testing & moving from development to production much easier.

(Disclaimer - I am one of the lead developers of kedro)

We hope that you give it a try and give us feedback :)