| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by janalsncm 308 days ago
	Quick example, Kimi K2 is a recent large mixture of experts model. Each “expert” is really just a path within it. At each token, 32B out of 1T are active. This means only 3.2% are active for any one token.

1 comments

That sounds surprisingly like "Humans only use 10% of their brain at any given time."