Hacker News new | ask | show | jobs
by caravel 3396 days ago
[author] Airflow is not a data flow engine, though you can use it to do some of that, but we typically defer on doing data transformations using/coordinating external engines (Spark, Hive, Cascading, Sqoop, PIG, ...).

We operate at a higher level: orchestration. If we were to start using Apache Beam at Airbnb (and we very well may soon!), we' use Airflow to schedule and trigger batch beam jobs alongside the rest of our other jobs.

1 comments

Thanks, that's really interesting. The usage of 'pipeline' to describe both sorts of system made me think there was a lot of overlap, but I'm understanding now how they are complementary.