Hacker News new | ask | show | jobs
by frde_me 81 days ago
Aren't you describing why they use mixture of experts? Where a sub-set of weights are activated depending on the query?