Hacker News new | ask | show | jobs
by zozbot234 83 days ago
The models are not technically comparable: the Qwen is dense, the Gemma is MoE. The ~33B models are the other way around!