Hacker News new | ask | show | jobs
by aetherspawn 265 days ago
I really just want to know how it compares to ChatGPT and Claude at various tasks, but there aren’t any graphs for that.
1 comments

It will probably take a few days/week for some in depth benchmarks to start popping up.

The IBM article has this image showing that it's supposed to be a bit ahead of GPT OSS 120B for at least some tasks (horrible URL but oh well): https://www.ibm.com/content/dam/worldwide-content/creative-a...

So in general it's going to be worse than GPT-5 and also Sonnet 4.5, but closer to GPT-5 mini. At least you can run this on prem, but none of the others. Pretty good, could possibly replace Qwen3 for quite a few use cases!

Edit: or perhaps not, seems like 3rd party benchmarks aren't as positive.