| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by functional_dev 69 days ago
	This confused me at first as well.. inactive experts skip compute, but weights are sill loaded. So memory does not shrink at all. I found this visualisation helpful - https://vectree.io/c/sparse-activation-patterns-and-memory-e...