Y
Hacker News
new
|
ask
|
show
|
jobs
by
GaggiX
809 days ago
In the paper they already show a mixing of these two with Mixture-of-Depths-and-Experts (MoDE).