Hacker News new | ask | show | jobs
by h4kor 974 days ago
A question for people researching LLMs and their capabilities:

Is there any reason to believe that the interaction of multiple agents (using the same model) will yield some emergent property that is beyond the capabilities of the agent model?

I'm not working with LLMs, but my intuition is that whatever these multi agent setups come up with could also be achieved by a single agent just talking to itself, as they all are "just guessing" what the most probable next token is.

10 comments

Since a single inference is limited by context length, a multiple agents model is able to process more context at each steps of the reasoning chain, which might improve the overall quality. However, given how easy it is getting to fine tune models, it's likely that multi-agent models will make a lot of sense to split the workload and assign each part to a specialized agent.
> a single inference is limited by context length,

Yes.

> multiple agents model is able to process more context at each steps of the reasoning chain

What?

How can a multi agent model have more context at a single step? The single step runs on a single agent. It would literally the same as a single agent?

The multi agent approach is simply packaging up different “personas” for single steps; and yes, it is entirely reasonable to assume that given N configurations for an agent (different props, different temp, different models even) you would see emergent behaviour that a single agent wouldn’t.

For example, you might have a “creative agent” to scaffold something and a “conservative” agent to fix syntax errors.

…but what are you talking about with different context sizes? I think you’re mixing domain terms; context is the input to an LLM. I don’t know what you’re referring to, but multi agent setups make absolutely no difference to the context size.

Their comment uses two (valid) context lengths: "organizational total" and "single agent." The latter is a subset of the former.

By analogy: no agent can summarize War and Peace, but several agents can, Peace-wise (sorry). Like AI map reduce. The question is thus "why not use one agent for this recursive merger?" Answers maybe being:

1. Different scholars (Russian lit. agents, ...war strategists?, etc) pay attention to different things with valuable insights

2. Multiple readers parallelize well, and some are faster than others

3. Managers can direct talent to (re)read chapters most relevant to their specialties, and coordinate meta-learning and communication

You might not get much mileage out of this approach with book summaries, but other domains are a different story (sorry).

I’m not sure what this means.

Are you agreeing or disagreeing?

Yes, multiple agents with different personas will give different takes and may lead to emergent behaviour, eg. discussing the book.

Yes, they could run in parallel.

No, any single multi agent step will not have any more context than any other single step.

If you believe that the Nth prompt in a chat to a LLM eternal multiple agents has “more context” than a chat between a single agent (and itself) you don’t understand how this works.

…or you are choosing to invent your own definition of “context”.

I think this is right inline with the utility of multi agent models. Whether distributing tasks to specialized agents trained on domain knowledge or collaborating with context aware agents. I think the context is where we are going to find limitations early on especially when models are expected to work on live data. Rather than constantly retraining a model, you leverage a model that is already primed through in-context learning based on previous interactions and relevant data.
context window is fast becoming a non issue (memgpt, SPR, sink tokens etc)
My understanding is it's about attention.

When you give it a specific role it essentially hones in on the relevant part of the training data. Researcher in X field? Papers from that field get priority in formulating responses and the accuracy of token prediction for contextually relevant tasks goes up.

OTOH, if you try to go 'meta' - ie. you give it a scenario where it imagines a group of scholars chatting with each other, then it hones in on situations where there is a dialogue amongst a group (ie. a play/script).

In a way it is the same thing, agents are mostly an abstraction that make it easier to know what’s going on.

I think of agents more or less as python classes with a mixture of natural language and code functions. You design them to do something with information they produce, and to interface with other agents or “tools” in some way.

But all the agents can be the same language model under the hood, they are frames used to build different kinds of contexts.

And yes I think the idea is that emergent behaviour can be useful. This comes to mind

https://github.com/MineDojo/Voyager

But I think we are still a small ways off from being really smart about agents. My opinion is that we haven’t quite figured out what we are doing yet.

Given we know different prompts perform better on different tasks (via evals, papers, etc), you can think of multiple agents interacting (especially when there's a specialized "router" or orchestrator) as sub problems of a larger task being solved by "agents" specialized for that task - prompts + context crafted for that sub-problem.
We do a ton in louie.ai bc of this:

* sometimes we want an LLM with longer context, faster speed, higher quality, etc: so even in a model family, in the same job, diff model configs

* we do a lot of prompt tuning for agent calls, like what a good Splunk query is, what SQL tables are currently available, what a good chart is, how to using a graph library, ...

* we also do accompanying code-level work, like running a generated python data analysis in a sandbox and feeding back exceptions to the LLM, or checking for parse errors when running a DB query, which feed back to the LLM

* When working directly on data, we might run it through the LLM, which might get into parallel chunked calls, a summary tree, etc, where a single LLM call would be insufficient, costly, slow, etc

https://arxiv.org/abs/2307.07924

at the very least, you can get things done that would be extremely difficult or practically impossible to do with a single instance.

> Is there any reason to believe that the interaction of multiple agents (using the same model) will yield some emergent property that is beyond the capabilities of the agent model?

If you write a short story it's often better to split it into parts (make an outline, write the story, edit the story) than if you would try to do the whole process at once. The same can be true for LLMs I suppose.

In an LLM sense this would be like the different system prompts are sampling different parts of the training distribution, but I'm not able to validate such a claim or know if someone has validated it before.

From my experience it's a modularization technique. It makes it easier to reason about and improve the system. For example, instead of one big model capable of doing anything, you can separate the system into specialized subsystems with different prompts and improve them over time.
Mixture of experts: Make each model world-class within a single domain. If adding one more common-sense QnA makes the calculus-bot even slightly worse at caculus, don't do it.

https://en.wikipedia.org/wiki/Mixture_of_experts

The “mixture of experts” concept in LLMs is a way of training a single model, it’s not based on training many different models (although that was the idea when the term was originally coined).
It's possible that they can only "wear so many hats" at the same time.
And that is where MoE comes in with some more advancements in routing