Hacker News new | ask | show | jobs
by LASR 1073 days ago
I think the parent poster is saying that it’s grading the prompts and not the output generated from the prompts.

Yeah I agree there. Unless you can check against the output, it’s not really telling much.