Y
Hacker News
new
|
ask
|
show
|
jobs
by
riku_iki
217 days ago
its moe, each expert tower can be branched from some smaller model.
1 comments
jychang
213 days ago
That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.
link