| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nimitkalra 434 days ago
	There are technical quirks that make LLM judges particularly high variance, sensitive to artifacts in the prompt, and positively/negatively-skewed, as opposed to the subjectivity of human judges. These largely arise from their training distribution and post-training, and can be contained with careful calibration.