| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sdenton4 255 days ago
	Indeed - human judges suck on average. And you can prompt an llm judge to look for particular kinds of problems, then throw the ensemble of judges at an output to nitpick. (Essentially, bake in a diversity of biases through a collection of prompts.)