Hacker News new | ask | show | jobs
by GaggiX 809 days ago
In the paper they already show a mixing of these two with Mixture-of-Depths-and-Experts (MoDE).