Y
Hacker News
new
|
ask
|
show
|
jobs
by
janalsncm
308 days ago
Quick example, Kimi K2 is a recent large mixture of experts model. Each “expert” is really just a path within it. At each token, 32B out of 1T are active. This means only 3.2% are active for any one token.
1 comments
Sophira
308 days ago
That sounds surprisingly like "Humans only use 10% of their brain at any given time."
link