Hacker News new | ask | show | jobs
by beoberha 767 days ago
From the website linked in the readme:

“A lot of research has been doing in this are and we can expect a lot more in 2024 in this space. I promise to share some clarity around where I think this industry is headed. In personal talks I have warned that multi-agent systems are complex and hard to get right. I've seen little evidence of real-world use cases too”

These assistant systems fascinate me, but I just don’t have the time and energy to set something up. I was going to ask if anyone had a good experience with it, but the above makes it sound like there’s not much hope at the moment. Curious what other people’s experience are.

4 comments

We tried using a multi-agent system for a complex NLP-type task and we found:

- Too many errors that just propogate on top of each other, if a single agent in the chain generates something even a little bit off then the whole system goes off the rails.

- You often end up having to pass a massive amount of shared context to every agent which just increases the cost dramatically.

Curiously enough we had an architect from OpenAI tell us the same thing about agent systems a few days ago (our company is a big spender so they serve a consulting function), so I don't think anybody is really finding success with multi-agent systems currently. IMO the core tech is nowhere near good enough yet.

> Too many errors that just propogate on top of each other

LLMs are like the perfect improv comedy troupe, they virtually always say “yes, and…”

> perfect improv comedy troupe

Check out Vtubers like CodeMiko, who improvs against LLM agents. Or 24/7 streaming LLM cartoon shows that take audience plot suggestions.

we do multistep programs in louie.ai via a variety of agents/tools, like "get X data from DB Y, wrangle cols A+B in Python, and then draw an interactive map + graph"

The ultimate answer is fairly short if you are a senior python data scientist, like 50loc. The agents will wander and iterate until they push through. You might correct & tweak if a bit off.

Importantly, this does agents opposite of the way Devin AI engineer replacements are presented. Here, you get it to do a few steps, and then move on to the next few steps. The agents still crank away a ton and do all sorts of clever things for you... to get you more reliably to the next step, vs something big & wrong.

So the human is like a reviewer, coming in, checking things, tweaking etc, then sending it back to the machine? (At which point the cycle continues)
Yes, imagine data analysis scenarios like Excel users or Jupyter notebooks, or operational investigations like user 360's and security incidents. Just now defaulting to natural language and connected to your data silos and a variety of analytics tools & libraries.

We try to make the generated code and backing data explainable. Users are figuring out the scenario by having the AI go ahead for them, and automating much of the debug loop in typical coding and investigations, so folks can focus more on the analysis, less on syntax, schemas, libraries, and be more ambitious on each step.

Importantly, it is still kind of like making a much more accessible Jupyter notebook or editable excel/doc, vs a linear chat session. Instead of generating the whole notebook and it being buggy and you starting over (~= Devin, or notebook.io's ChatGPT plugin), you drive it forward only 1-3 cells at a time, and as it is an interactive document so you can edit those, go to the next, or non-destructively edit earlier ones. In contrast, ChatGPT's data assistant deletes cells below the current edit, which would stink in a normal data env.

There are other differences, but from a perspective of using genAI well, we budget 3-60s for genAI assisting in 1-3 steps, aiming for 10-100x productivity wins and a lot more peace of mind during it. Taking 1-3 steps forward may mean the AI takes 3-10 internally due to backtracking / CoT / etc

We could let the system take 100 turns, and have interesting experiments there such as around security investigations, but the use cases become more niche due to cascading errors => reliability.

Thanks @beoberha, I am too. I like one take I heard on Twitter. The sentiment was something like these types of systems are useful under the AI-Powered Productivity industry which has incremental gains, no big bangs. Said another way, if your job was to help a TON of your employees be more productive individually, it is worth it because companies measure those efforts broadly and the payoff is there. But again, not big. My advice for folks to stay lower level and hook AI automation up with simple, closed loop, LLM patterns that feel more like basic API calls in a choreographed manner. OMG, hope all that made sense
that's actually a great reply, thanks
A lot of folks I've spoken with say that single-agent systems are still extremely limited, let alone multi-agent platforms. In general, it seems to boil down to:

- Agents need lots of manual tuning and guardrails to make them useful

- Agents with too many guardrails are not general-purpose enough to be worth the time and effort to build

I believe truly great agents will only come from models whose weights are dynamically updated. I hope I'm wrong.

By the time you do get around to it OpenAi would have built a full interface for this. This is the type of stuff that’s gonna get steamrolled.
I'm impatiently waiting to become the ultimate armchair music video director I've always dreamed of once this video AI thing rolls out...
Pretty much this. I'd love counter examples of startups in the space that haven't been crushed from the top yet.