If I did the same sort of thing but used Claude to grade the tests, would I get similar results? Or would that be inherently biased towards Claude scoring high?