|
|
|
|
|
by zozbot234
125 days ago
|
|
The open-weight models are great but they're roughly a full year behind frontier models. That's a lot. There's also a whole lot of uses where running a generic Chinese-made model may be less than advisable, and OpenAI/Anthropic have know-how for creating custom models where appropriate. That can be quite valuable. |
|
Artificial Analysis isn't perfect, but it is an independent third party that actually runs the benchmarks themselves, and they use a wide range of benchmarks. It is a better automated litmus test than any other that I've been able to find in years of watching the development of LLMs.
And the gap has been rapidly shrinking: https://www.youtube.com/watch?v=0NBILspM4c4&t=642s