Y
Hacker News
new
|
ask
|
show
|
jobs
by
cjbprime
798 days ago
(You can't compare parameter count with a mixture of experts model, which is what the 1.8T rumor says that GPT-4 is.)
1 comments
schleck8
798 days ago
You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.
link
cjbprime
798 days ago
Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.
link