| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mkotlikov 357 days ago
	Models tend to prefer output that sounds like their own. If I were to run these benchmarks I would have: 1) Gemini 2.5 Pro rank only non-google models 2) Claude 4.1 Opus rank only non-Anthropic models 3) GPT5-thinking rank only non-OpenAI 4) Then sum up the rankings and sort by the sum.