There are some great ideas in this post, namely the focus on human attention as the limited resource and the recursive decomposition of tasks. There were also a few points which I found unclear. You describe recursive decomposition as a "tree of DAGs ...". What do you mean by this? Is each node in the tree a DAG? Secondly, I agree that the tree is exponentially faster than one agent scanning a flat list, but are you not running exponentially more "scanning" agents? If human attention is the only limiting factor this is fine, but is this not a problem if you have limited compute?
Running more scanning agents exponentially is an interesting proposition as we scale to massive, continuously growing tasks. In my initial experimentations with well-defined tasks, the overhead has not been worth diving into yet, since latency has been low enough.
It would turn a slow search into a highly parallelised "MapReduce" problem. You trade a brief, massive burst of machine execution to keep wall-clock latency incredibly low for the human waiting at the top.
A tree structure means these scanning agents don’t just run wild. High-level nodes could aggressively prune entire branches the moment a scanning agent reports a dead end.
Human bandwidth and context switching definitely seems to be a major bottleneck at the moment, but if scheduling several tasks concurrently to batch questions I imagine there is an upper limit on how many tasks can be run in parallel, especially if there are dependencies between them. I think we need to start charging employers per token of human attention if you optimise the mental throughput. I'm going to miss doomscrolling while claude writes the plan though :(
"upper limit on how many tasks can be run in parallel" - definitely, however at the moment codinig workflows like Claude Code will spawn around 5 agents in parallel at max, but we can hope that dynamic workflows provides a better solution however the intial reports on the coherency of the their multi-agent system seems to be sub-optimal
probably the only way to make something like this affordable. But the way models are trained right now is completely wrong for this anyway. They’re nowhere near good enough at estimating their own uncertainty and it’s RLHF’s fault for these ‘crutch plans’ we get. I guess the point of the architecture is that it shouldn’t need big smart models to work well, but whatever models you use, what’s the post-train for forked execution going to look like? This sounds so expensive to train too