| This is super big news if it’s real. Basically, given an agent with an initial set of predefined actions and goal, they’re saying “decompose this into steps and pick and action to achieve each step”. Pretty standard stuff. Then they say, hey, if you can’t solve the problem with those actions (ie. failed repeatedly when attempting to solve), write some arbitrary generic python code and use that as your action for the next step. Then save that as a new generic action, and slowly build up a library of actions to augment the initial set. The thing is, there’s no meaningful difference between the task “write code to solve this task” and “write code to solve this action”; if you can meaningfully generate code that can, without error, perform arbitrary tasks, you’ve basically solved programming. So… that would be quite a big deal. That would be a real “Devon” that would actually be able to write arbitrary code to solve arbitrary problems. …which makes me a bit sceptical. Still, this seems to have at least worked reasonably well (as shown by being a leader on the GAIA leaderboard) so they seem to have done something that works, but I’m left wondering… If you’ve figured out how to get an agent to write error free deterministic code to perform arbitrary actions in a chain of thought process, why are you pissing around with worrying about accumulating a library of agent actions? That’s all entirely irrelevant and unnecessary. Just generate code for each step. So… something seems a bit strange around this. I’d love to see a log of the actual problem / action / code sequences. |
Anyway, this is pretty standard stuff already. In all my agent workflows the agents are able to write their own code and execute it before passing the result to the next agent. It doesn't need to be perfect since you always have an agent validating the results, sending the task back if necessary.
I haven't read the paper beyond the synopsis so I might be missing a crucial key takeaway and I presume it has a lot of additional layers.