| It only seems effective, unless you start using it for actual work. The biggest issue - context. All tool use creates context. Large code bases come with large context out of the bat. LLM's seem to work, unless they are hit with a sizeable context. Anything above 10k and the quality seems to deteriorate. Other issue is that LLM's can go off on a tangent. As context builds up, they forget what their objective was. One wrong turn, and in the rabbit hole they go never to recover. The reason I know, is because we started solving these problems an year back. And we aren't done yet. But we did cover a lot of distance. [Plug]: Try it out at https://nonbios.ai: - Agentic memory → long-horizon coding - Full Linux box → real runtime, not just toy demos - Transparent → see & control every command - Free beta — no invite needed. Works with throwaway email (mailinator etc.) |
I think this is probably at the heart of the best argument against these things as viable tools.
Once you have sufficiently described the problem such that the LLM won't go the wrong way, you've likely already solved most of it yourself.
Tool use with error feedback sounds autonomous but you'll quickly find that the error handling layer is a thin proxy for the human operator's intentions.