|
|
|
|
|
by zingelshuher
812 days ago
|
|
I run some tests. Single model of the same size is better than MoE. Single expert out of N is better than model of the same size (i.e. same as expert). 2 experts are better than one. That was on small LLM, not sure if it scales. |
|