|
|
|
|
|
by marci
32 days ago
|
|
"That’s where EMO comes in. We show that EMO – a 1B-active, 14B-total-parameter (8-expert active, 128-expert total) MoE trained on 1 trillion tokens – supports selective expert use: for a given task or domain, we can use only a small subset of experts (just 12.5% of total experts) while retaining near full-model performance." https://allenai.org/blog/emo |
|