| tldr, if you really dig past the marketing (from the FAQ (1)): > We see Airflow and Luigi as complementary frameworks: Airflow and Luigi are tools that handle deployment, scheduling, monitoring and alerting. Kedro is the worker that should execute a series of tasks, and report to the Airflow and Luigi managers. > Create the data transformation steps as pure Python functions Personally, I feel mystified why you would use something like this rather than a more mature product like say, Spark, that natively supports clustering, etc, which is what I would really like to see in the FAQ. Is it a processing solution? Not really, since it suggests you can offload the heavy lifting to an engine, eg. spark. An orchestrator? Apparently not, because that's a complementary product. So... it's like, a configuration management tool? Pretty hard to see the use case to me. 1. https://kedro.readthedocs.io/en/latest/06_resources/01_faq.h... |
I actually had the same questions when I was first introduced to Kedro! In my case, I didn't understand the value proposition over something like Apache Beam. After using it, I feel like Kedro provides:
Additionally, it aligns well with standards we have internally, like data layering. (edit: Apparently this is also part of the FAQ: https://kedro.readthedocs.io/en/latest/06_resources/01_faq.h... Who knew!)> Personally, I feel mystified why you would use something like this rather than a more mature product like say, Spark, that natively supports clustering, etc, which is what I would really like to see in the FAQ.
I'd say 80-90% of projects at QuantumBlack use (Py)Spark, so we've built out `SparkDataSet`s, `pandas_to_spark` and `spark_to_pandas` utility decorators, etc. There's a brief integration tutorial here: https://github.com/quantumblacklabs/kedro/tree/develop/kedro...
Full disclosure: I'm a data engineer at QuantumBlack (if it wasn't obvious already!)