Hacker News new | ask | show | jobs
by dimberman 2004 days ago
Hi y'all! Airflow PMC here!

Feel free to AMA about Airflow's new features/the roadmap going forward!

2 comments

Are there any plans for DAGs packaging in docker containers similar to what Prefect does?

Would be perfect to have separate dependencies for different DAGs, otherwise we always end up with a pile of everything ever needed with no clear way to remove obsolete packages from setup.

So there are a few options for that if you're interested!

1. If you're using the KubernetesExecutor, you can point to custom images for individual tasks, this will primarily work if you're storing DAGs in git or a volume (or if you want to handle baking in DAGs for different images).

2. You can use custom images in KEDA queues. This way you can simply point to a queue for all tasks in that DAG and they will run in that environment.

3. You can use the k8spodoperator. Now that the k8spodoperator allows for templating, it would be pretty easy to create a template for a pod and just inject different commands for different steps.

Hope that helps!

Thanks!

That's not exactly what I was looking for, though. Because every listed approach injects technical complexity in the middle of my business logic.

I.e. if I have two consequent tasks I have to define them as a separate scripts or commands, package them, upload and then orchestrate them in a completely different place.

While in Prefect I have all the niceness of writing almost plain Python (as with new tasks API in Airflow), then I can package and distribute the whole thing in docker image with a single command. It really matters!

Have you tried Flyte.org?
I did not, thank you for the tip!
Ohh you are welcome, join the slack channel and ask for help. The community is growing everyday - here are some examples of using it in python https://flytecookbook.readthedocs.io/en/latest/
Hey wanted to give a bit of feedback.

We've found Airflow and ECS Fargate to be a great combination for running ETLs. It keeps Airflow small and dumb, and lets the Fargate containers do the heavy or complicated lifting in language of developer's choice.

We'd really appreciate if the ECS Operator could be given a bit of attention:

Running a task on FARGATE_SPOT containers is a cheap, convenient option, but it requires passing capacityProviderStrategy in. https://issues.apache.org/jira/browse/AIRFLOW-6604

Also currently the ECSOperator only shows the output logs once the task has finished (which could take hours), it'd be better if the operator could poll the Cloudwatch logs during the run rather than wait for it to finish.

---

Congratulations on the release, I'm looking forward to upgrading soon, and trying out the new features and syntax!

Thank you for the feedback! I'm gonna pass that on to some AWS experts in the community.

One really nice feature of 2.0 is now the "providers (hooks, operators, etc.) are released separately from Airflow itself. So you won't need to upgrade airflow to get improved AWS operators unless there is a breaking change.

Ditto, running Airflow on AWS ECS Fargate serverless. We did this prior to AWS announcing their Managed Workflows for Apache Airflow[1]. Do you know if and when AWS will be making Airflow 2.0 available in their managed service?

[1] https://aws.amazon.com/managed-workflows-for-apache-airflow/

Have you tried Managed Workflows for Apache Airflow?

I was curious about it but the pricing page scared me off, the smallest which runs 50 DAGs is about $0.49/hr! I couldn't understand why the pricing was that way.

I'm sure they will, I'm not sure on their timeline though.