| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 1 day ago
	For sparse MoE models, the single expert layers that the inference gets sampled from are actually quite small - single-digit megabytes or so.