| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by schleck8 834 days ago
	You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.

1 comments

Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.