Y
Hacker News
new
|
ask
|
show
|
jobs
by
rughouse
808 days ago
It’s very similar to Mixture of Experts. But instead of routing tokens to multiple experts, you "deploy to a single expert which can be dynamically skipped"
1 comments
erikaww
808 days ago
Mixing these would be pretty cool. Further reduced compute for MoE while keeping the performance.
link
GaggiX
808 days ago
In the paper they already show a mixing of these two with Mixture-of-Depths-and-Experts (MoDE).
link