Hacker News new | ask | show | jobs
by jychang 219 days ago
That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.