Y
Hacker News
new
|
ask
|
show
|
jobs
by
jychang
219 days ago
That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.