Hacker News new | ask | show | jobs
by 2024user 622 days ago
What is the challenge here? Orchestration/triage to specific assistants seems straight forward.
1 comments

There isn't one.

The real challenge for at scale inference is that the compute for models is too long to keep normal API connections open and you need a message passing system in place. This system also needs to be able to deliver large files for multi-modal models if it's not going to be obsolete in a year or two.

I build a proof of concept using email of all things but could never get anyone to fund the real deal which could run at larger than web scale.

Why not use Temporal?

An example use with AWS Bedrock: https://temporal.io/blog/amazon-bedrock-with-temporal-rock-s...

Because when you see someone try and reinvent Erlang in another language for the Nth time you know you can safely ignore them.
ooc how does Temporal reinvent Erlang?
I don't.

Sorry, you mean the company.

Thanks. Could something like Kafka be used?
You could use messenger pigeons if you felt like it.

People really don't understand how much better LLM swarms get with more agents. I never hit a point of diminishing returns on text quality over two days of running a swarm of llama2 70Bs on an 8x4090 cluster during the stress test.

You would need something similar to, but better than, whatsapp to handle the firehose of data that needs to cascade between agents when you start running this at scale.

>People really don't understand how much better LLM swarms get with more agents. I never hit a point of diminishing returns on text quality

Could you elaborate please ?

One use for swarms is to use multiple agents/prompts in place of one single agent with one long prompt in order to increase performance by splitting one big task into many. It is very time consuming though, as it requires experimenting to determine how best to divide one task into subtasks, including writing code to parse and sanitize each task output and plug it back into the rest of the agent graph.

Dspy [1] seems to target this problem space but last time I checked it only focused on single prompt optimization (by selecting which few shots examples lead to the best prompt performance for instance), but even though I have seen papers on the subject, I have yet to find a framework that tackles the problem of agent graph optimization although research on this topic has been done [2][3][4]

[1]DSPy: The framework for programming—not prompting—foundation models: https://github.com/stanfordnlp/dspy

[2]TextGrad: Automatic 'Differentiation' via Text -- using large language models to backpropagate textual gradients: https://github.com/zou-group/textgrad

[3]What's the Magic Word? A Control Theory of LLM Prompting: https://arxiv.org/abs/2310.04444

[4]Language Agents as Optimizable Graphs: https://arxiv.org/abs/2402.16823

>Could you elaborate please ?

No.

I've tried explaining this to supposedly smart people in both a 15 minute pitch deck and a research paper and unless they were inclined to think it from the start no amount of proof has managed to convince them.

I figure it's just not possible to convince people, even with the proof in front of them, of how powerful the system is. The same way that we still have people arguing _right now_ that all LLMs are just auto complete on steroids.

Prove how powerful "the system" is by doing something useful or value-generating with it. Then people will believe you. Talk is cheap.
> people arguing _right now_ that all LLMs are just auto complete on steroids.

Funny because when I learned about how LLMS worked my immediate thought was "Oh, humans are just LLMs on steroids". So auto complete on steroids squared.

I'm inclined to think it from the start