Hacker News new | ask | show | jobs
by coderenegade 12 days ago
It's hard to know for sure. There are good information theoretic reasons to suspect that general models will always be better than smaller expert models, but maybe a MoE can claw some performance back, albeit with redundant computation. The properties of conditional entropy, for instance, always favor more generality. This assumes that the harness isn't a factor, or is at least equivalent across different models.