| I would disagree here. Building a good and working coding harness with smaller models is really hard. Everything evolves around the limited context size. Tools must be specification driven to reduce noise and high temp hallucinations, tool call shrinking needs to remove errors and tryouts of different formats of parameters (because LLMs always ignore descriptions in the JSON...), and you have to deal with long running agents because you can't afford them. Planner/orchestrator architecture, agent to agent communication need to be summarized, and then you have the messed up scheduling parts, because you need to prioritize short running agents and give the planner a tool to wait for outputs of spawned contractor agents. And that's not even talking about sandbox vs playground read/write/access policies of tools. Harness engineering, if done correctly, is quite hard. And all of this works 60% of the time, every time. Anyways, that was somewhat the summary of the last 6 months building my exocomp agentic environment. And it's still not satisfying to work with. |