|
|
|
|
|
by petesergeant
490 days ago
|
|
LLM as a judge isn't telling you if something is right or wrong, it's telling you if a given generation is normal or an aberration, according solely to the data the model was trained on. This is one reason LLMs prefer their answers to answers from other LLMs. "LLMs as a judge" is more about addressing the failure mode of auto-regressive (one-token at a time) generation letting an LLM lead itself astray due to its previous choices, rather than telling you any general truth. Finally I'd note that in every maths challenge I ever completed as a student, you were strongly advised to go back and check your own work at the end if you had time left over, and for me this usually led to me catching things I'd missed the first time. |
|
It seems their pr is willing to make much stronger claims than you will.