Y
Hacker News
new
|
ask
|
show
|
jobs
by
erikaww
808 days ago
Mixing these would be pretty cool. Further reduced compute for MoE while keeping the performance.
1 comments
GaggiX
808 days ago
In the paper they already show a mixing of these two with Mixture-of-Depths-and-Experts (MoDE).
link