Hacker News new | ask | show | jobs
by akomtu 382 days ago
LLMs can't evaluate their own output. LLMs suggest possibilities, but can't evaluate them. Imagine an insane man who is rumbling something smart, but doesn't self-reflect. The evaluation is done against some framework of values that are considered true: the rules of a board game, the language syntax or something else. LLMs also can't fabricate evaluation because the latter is a rather rigid and precise model, a unlike natural language. Otherwise you could set up two LLMs questioning each other.
2 comments

Isn't this kind of the hope/dream of multi-agent systems where one LLM "coordinates" among others or checks the responses? In my experience it works about as well as you're describing.
Sorry, what do GANs have to do with this? It is not the same kind of "evaluation".

And anyway, there is no need to have two networks to iteratively refine output: one suffices (like we naturally are meant to do).