| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by torginus 99 days ago
	My understanding is that for MoE with top K architecture, model size doesn't really matter, as you can have 10 32GB experts or a thousand, if only 2-3 of them are active at the same time, your inference workload will be identical, only your hard drive traffic will incread. Which seems to be the case, seeing how hungry the industry lately has been for hard drives.