Y
Hacker News
new
|
ask
|
show
|
jobs
by
Hugsun
57 days ago
They're comparing Qwen's moe vs dense (smaller difference) against Gemma's moe vs dense (bigger difference). Your proposed alternative misses the point.
1 comments
zozbot234
57 days ago
Gemma's dense is bigger than its moe's total parameters. You could totally expect the moe to do terribly by comparison.
link