Hacker News new | ask | show | jobs
by nxtfari 63 days ago
This makes a lot of my experience with Qwen make sense. I’ve watched all the benchmarks imply how close it should be to various GPT or Claude releases, but in my own use chatting with it or trying to get it do agentic tasks it was nowhere near as smart as even GPT-3.5 for example. Meanwhile Gemma 4 casually dropped and even the 4B models were performing better than Qwen 3.5 MOE in my chats. Benchmaxxing.