| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 125 days ago
	The open-weight models are great but they're roughly a full year behind frontier models. That's a lot. There's also a whole lot of uses where running a generic Chinese-made model may be less than advisable, and OpenAI/Anthropic have know-how for creating custom models where appropriate. That can be quite valuable.

2 comments

coder543 125 days ago

I would not say a full year... not even close to a year: GLM-5 is very close to the frontier: https://artificialanalysis.ai/

Artificial Analysis isn't perfect, but it is an independent third party that actually runs the benchmarks themselves, and they use a wide range of benchmarks. It is a better automated litmus test than any other that I've been able to find in years of watching the development of LLMs.

And the gap has been rapidly shrinking: https://www.youtube.com/watch?v=0NBILspM4c4&t=642s

link

zozbot234 125 days ago

Benchmarks are always fishy, you need to look at things that you'd use the model for in the real world. From that point of view, the SOTA for open models is quite behind.

link

lancebeet 125 days ago

If benchmarks are fishy, it seems their bias would be to produce better scores than expected for proprietary models, since they have more incentives to game the benchmarks.

link

coder543 125 days ago

No... benchmarks are not always "fishy." That is just a defense people use when they have nothing else to point to. I already said the benchmarks aren't perfect, but they are much better than claiming vibes are a more objective way to look at things. Yes, you should test for your individual use case, which is a benchmark.

As I said, I have been following this stuff closely for many years now. My opinion is not informed just by looking at a single chart, but by a lot of experience. The chart is less fishy than blanket statements about the closed models somehow being way better than the benchmarks show.

link

mattmaroon 125 days ago

That's a lot now, in the same way that a PC in 1999 vs a PC in 2000 was a fairly sizeable discrepancy. At some point, probably soon, progress will slow, and it won't be much.

link