Hacker News new | ask | show | jobs
by janalsncm 308 days ago
Quick example, Kimi K2 is a recent large mixture of experts model. Each “expert” is really just a path within it. At each token, 32B out of 1T are active. This means only 3.2% are active for any one token.
1 comments

That sounds surprisingly like "Humans only use 10% of their brain at any given time."