Y
Hacker News
new
|
ask
|
show
|
jobs
by
laborcontract
694 days ago
Sounds like you're describing mixture of experts, the architecture being used in openai's gpt-4 and mistral's mixtral series of models.
1 comments
pants2
694 days ago
Not really, MoE is trained all at once and the 'experts' don't have pre-defined specializations. They end up being more like "punctuation expert" and "pronoun expert" than "math expert" and "french expert"
link
laborcontract
694 days ago
Haven't tried any yet, but it sounds like parent may be interested in an LLM router.
https://github.com/lm-sys/RouteLLM
link