| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by coderenegade 63 days ago
	It's hard to know for sure. There are good information theoretic reasons to suspect that general models will always be better than smaller expert models, but maybe a MoE can claw some performance back, albeit with redundant computation. The properties of conditional entropy, for instance, always favor more generality. This assumes that the harness isn't a factor, or is at least equivalent across different models.