Hacker News new | ask | show | jobs
by swyx 6 days ago
see Beyond Unit Tests and Novel Grading Methods in TFA.

i think something like ~60% llm as judge rubrics and the rest as described. every rubric validated by maintainer. 3000 rubrics