| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bradknowles 1027 days ago
	How is this benchmark not inherently biased towards GPT? If I did the same sort of thing but used Claude to grade the tests, would I get similar results? Or would that be inherently biased towards Claude scoring high?