|
|
|
|
|
by mkotlikov
310 days ago
|
|
Models tend to prefer output that sounds like their own. If I were to run these benchmarks I would have: 1) Gemini 2.5 Pro rank only non-google models
2) Claude 4.1 Opus rank only non-Anthropic models
3) GPT5-thinking rank only non-OpenAI
4) Then sum up the rankings and sort by the sum. |
|