Y
Hacker News
new
|
ask
|
show
|
jobs
by
schleck8
788 days ago
You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.
1 comments
cjbprime
788 days ago
Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.
link