|
|
|
|
|
by recipe19
334 days ago
|
|
Wasn't the "mixture of experts" a big thing in late 2023? The idea was that a vendor has a number of LLMs fine-tuned for specific tasks, none necessarily better than other, and that they applied heuristics to decide which one to rope in for which queries. |
|
That’s how people keep interpreting it but it’s incorrect. MoE is just a technique to decompose your single giant LLM into smaller models where a random one gets activated for each token. This is great because you need 1/N memory bandwidth to generate a token. Additionally, in the cloud, you split the model parts to different servers to improve utilization and drive down costs.
But the models aren’t actually separated across high level concepts.