|
|
|
|
|
by nimitkalra
387 days ago
|
|
There are technical quirks that make LLM judges particularly high variance, sensitive to artifacts in the prompt, and positively/negatively-skewed, as opposed to the subjectivity of human judges. These largely arise from their training distribution and post-training, and can be contained with careful calibration. |
|