Hacker News new | ask | show | jobs
by bradknowles 980 days ago
How is this benchmark not inherently biased towards GPT?

If I did the same sort of thing but used Claude to grade the tests, would I get similar results? Or would that be inherently biased towards Claude scoring high?