| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by akomtu 382 days ago
	LLMs can't evaluate their own output. LLMs suggest possibilities, but can't evaluate them. Imagine an insane man who is rumbling something smart, but doesn't self-reflect. The evaluation is done against some framework of values that are considered true: the rules of a board game, the language syntax or something else. LLMs also can't fabricate evaluation because the latter is a rather rigid and precise model, a unlike natural language. Otherwise you could set up two LLMs questioning each other.

2 comments

candiddevmike 382 days ago

Isn't this kind of the hope/dream of multi-agent systems where one LLM "coordinates" among others or checks the responses? In my experience it works about as well as you're describing.

link

izabera 382 days ago

oh boy do i have the paper for you https://proceedings.neurips.cc/paper_files/paper/2014/file/f...

link

mdp2021 382 days ago

Sorry, what do GANs have to do with this? It is not the same kind of "evaluation".

And anyway, there is no need to have two networks to iteratively refine output: one suffices (like we naturally are meant to do).

link