Hacker News new | ask | show | jobs
by dTal 901 days ago
This sounds a lot like the mixture-of-experts architecture, and the current best-performing language models (GPT-4, mixtral-8x7b) already use this architecture.

So congratulations, you win!

1 comments

That's not really how MoEs work. They never directly interact with eachother. There is one manager type model that takes a prompt, directs token inference to 1 or more models, chooses the best response, and continues. The analogy would be closer to a "swarm of agents". (There are a handful of names for this approach, I think swarm is catching on the most)