|
|
|
|
|
by caravel
3394 days ago
|
|
[Airflow author here] one of the main differences between Airflow and Luigi is the fact that in Airflow you instantiate operators to create tasks, where with Luigi you derive classes to create tasks. This means it's more natural to create tasks dynamically in Airflow. This becomes really important if want to build workflows dynamically from code [which you should sometimes!]. A very simple example of that would be an Airflow script that reads a yaml config file with a list of table names, and creates a little workflow for each table, that may do things like loading the table into a target database, perhaps apply rules from the config file around sampling, data retention, anonymisation, ... Now you have this abstraction where you can add entries to the config file to create new chunks of workflows without doing much work. It turns out there are tons of use cases for this type of approach. At Airbnb the most complex use case for this is around experimentation and A/B testing. I gave a talk at a Python meetup in SF recently talking about "Advanced data engineering patterns using Apache Airflow", which was all about dynamic pipeline generation. The ability to do that is really a game changer in data engineering and part of the motivation behind writing Airflow the way it is. I'm planning on giving this talk again, but maybe I should just record it and put it on Youtube. It's probably a better outlet than any conference/meetup... |
|