| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SilverElfin 128 days ago
	Some people have claimed that LLMs that aren’t from the big foundational model providers (OpenAI, Anthropic, Gemini) are basically gaming benchmarks to get great results. Does anyone know if that’s actually true? I don’t understand this entire post but from the tables of benchmark scores, it seems like this model performs well in a large variety of things. It feels to me like the diversity of benchmarks may mean it’s not just something built to game a benchmark, right?

1 comments

viraptor 128 days ago

Why not just check on your real tasks? I'm quite happy with the k2.5 and glm5 performance in practice. Whether they also gamed the benchmarks is not as relevant.

link