Hacker News new | ask | show | jobs
by avereveard 764 days ago
I've encountered two viable cases: instructions are too complex, too many tools, or wildly different processing steps, in which case it semplify a lot the processing to have a few well defined steps each doing their thing, and a coordinator on top, either sequential, or intelligent, that is only focuesed on next step routing.

the other is memory for conversational retrieval. ai memory is still quite limited, especially if there needs to be a lot of token in context, and context too long impede the ability of llm of focus on the task itself, especially if the context is itself a conversation or a request, so spreading the context along a few agents, and propagating the user request among agent, and having those produce answer fragment for another llm to formulate an answer allows to not lose the conversational context without swamping the llm with noise.

the problem tho remains latency as son as you nest them latency explodes as you can only stream the last layer of llm output