| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by trumpeta 1495 days ago

We operate a (small?) Airflow instance with ~20 DAGs but, one of those dags has ~1k tasks. It runs on k8s/aws setup with a MySQL backing it.

We package all the code in 1-2 different Docker images and then create the DAG. We've faced many issues (logs out of order, missing, random race conditions, random task failures, etc.)

But what annoys me the most is that for that 1 big DAG, the UI is completely useless, tree view has insane dupplication, graph view is super slow and hard to navigate through and answering basic questions like, what exactly failed and what nodes are around it are not easy.

4 comments

artwr 1495 days ago

At Airbnb, we were using SubDAGs to try to manage large number of tasks in a single DAG. This allowed organizing tasks and drilling down into failures more easily but came with its own challenges.

In more recent versions of Airflow, TaskGroups (https://airflow.apache.org/docs/apache-airflow/stable/concep..., https://www.astronomer.io/guides/task-groups/ ) were made to help this a little bit. Hopefully that helps a bit.

At ~1k nodes in the graph introspection becomes hard anyway, as others have suggested, breaking it down if possible might be a good idea.

link

rockostrich 1495 days ago

We had a similar DAG that was the result of migration a single daily Luigi pipeline to Airflow. I started identifying isolated branches and breaking them off with external task sensors back to the main DAG. This worked but it's a pain in the ass. My coworker ended up exporting the graph to graphviz and started identifying clusters of related tasks that way.

link

mywittyname 1495 days ago

I've not had the best luck with ExternalTaskSensors. There have been some odd errors like execution failing at 22:00:00 every day (despite the external task running fine).

link

mywittyname 1495 days ago

Also, the @task annotation provides no facilities to name tasks. So if you like to build reusable tasks (as I do), you end up with my_generic_task__1, my_generic_task__2, my_generic_task__n. I've tried a few hacks to dynamically rename these, but I just ended up bringing down my entire staging cluster.

link

artwr 1495 days ago

`your_task.override(task_id="your_generated_name")` not working for you?

link

mywittyname 1495 days ago

I got pretty excited when I read this response, but no, it doesn't work. I'm not sure how this would work since annotated tasks return an xcom object.

Can you point me to the documentation on this function? It's possible I'm not using it correctly.

I can do something like this, which works locally, but breaks when deployed:

    res = annotated_task_function(...)
    res.operator.task_id = 'manually assigned task id'

link

flowair 1495 days ago

@task.python(task_id="this_is_my_task_name")

def my_func():

...

link

mywittyname 1495 days ago

This still has the problem that, when you call my_func multiple times in the same dag, the resulting tasks will be labelled, my_func, my_func__1, my_func__2, ...

link

flowair 1493 days ago

How about the dynamic task mapping that is now available in 2.3?

link

suifbwish 1495 days ago

Does this imply file metadata content can effect the access performance of those files even for operations that do not directly concern the metadata?

link