| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zingelshuher 859 days ago
	I run some tests. Single model of the same size is better than MoE. Single expert out of N is better than model of the same size (i.e. same as expert). 2 experts are better than one. That was on small LLM, not sure if it scales.