| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dan-robertson 18 days ago
	The super literal interpretation ideas were much more common in the past when LLMs didn’t exist. Now we have models that are generally pretty good at picking up on nuance and understanding what you mean but also often quite bad at execution, which is roughly the opposite of that idea. I think reward hacking is perhaps the closest we see llms get to literal/malicious interpretations of instructions.

1 comments

wizzwizz4 18 days ago

LLMs are neither of those. They're quite good at pretending they understand what you mean, but they don't. That's why they can't execute: they're mimicking the form, not the substance, and then we see the form and anthropomorphise them in our minds.

link

Timwi 18 days ago

That's a lot of assertions with no real argument to back it up.

link

customguy 18 days ago

Any one of those "hey, can you count to 100 for me?" type shorts should be enough..

link

wizzwizz4 17 days ago

I've repeated the argument over and over since the GPT-2 days, when I derived it theoretically by inspecting the architecture of the model. I am now fatigued, and enough other people have taken up similar arguments – some developed half-way to a mathematical proof – that I no longer feel the obligation to keep repeating myself.

link

dmos62 17 days ago

You could post a link.

link