Hacker News new | ask | show | jobs
by armcat 25 days ago
This person did a great comparison against Qwen models, and despite them having 8x less active params, they outperform the Cohere model in every category: https://x.com/DJLougen/status/2057196012918149368?s=20