|
|
|
|
|
by popinman322
198 days ago
|
|
They're comparing against open weights models that are roughly a month away from the frontier. Likely there's an implicit open-weights political stance here. There are also plenty of reasons not to use proprietary US models for comparison:
The major US models haven't been living up to their benchmarks; their releases rarely include training & architectural details; they're not terribly cost effective; they often fail to compare with non-US models; and the performance delta between model releases has plateaued. A decent number of users in r/LocalLlama have reported that they've switched back from Opus 4.5 to Sonnet 4.5 because Opus' real world performance was worse. From my vantage point it seems like trust in OpenAI, Anthropic, and Google is waning and this lack of comparison is another symptom. |
|