|
|
|
|
|
by coderenegade
12 days ago
|
|
It's hard to know for sure. There are good information theoretic reasons to suspect that general models will always be better than smaller expert models, but maybe a MoE can claw some performance back, albeit with redundant computation. The properties of conditional entropy, for instance, always favor more generality. This assumes that the harness isn't a factor, or is at least equivalent across different models. |
|