They learn the value of specific actions in specific contexts based on the rewards they received during their play time. Specific actions and specific contexts are not transferable for various reasons. John quoted that varying frame rates and variable latency between action and effect really confuse the models.
Just go and ask ChatGPT or Claude something that can't possibly be in its training set. Make something up. If it is only memorising answers then it will be impossible for it to get the correct result.
A simple nonsense programming task would suffice. For example "write a Python function to erase every character from a string unless either of its adjacent characters are also adjacent to it in the alphabet. The string only contains lowercase a-z"
That task isn't anywhere in its training set so they can't memorise the answer. But I bet ChatGPT and Claude can still do it.
Honestly this is sooooo obvious to anyone that has used these tools, it's really insane that people are still parroting (heh) the "it just memorises" line.
LLMs don't "memorize" concepts like humans do. They generate output based on token patterns in their training data. So instead of having to be trained on every possible problem, they can still generate output that solves it by referencing the most probable combination of tokens for the specified input tokens. To humans this seems like they're truly solving novel problems, but it's merely a trick of statistics. These tools can reference and generate patterns that no human ever could. This is what makes them useful and powerful, but I would argue not intelligent.
It’s really easy: go to Claude and ask it a novel question. It will generally reason its way to a perfectly good answer even if there is no direct example of it in the training data.
When LLM's come up with answers to questions that aren't directly exampled in the training data, that's not proof at all that it reasoned its way there — it can very much still be pattern matching without insight from the actual code execution of the answer generation.
If we were taking a walk and you asked me for an explanation for a mathematical concept I have not actually studied, I am fully capable of hazarding a casual guess based on the other topics I have studied within seconds. This is the default approach of an LLM, except with much greater breadth and recall of studied topics than I, as a human, have.
This would be very different than if we sat down at a library and I applied the various concepts and theorems I already knew to make inferences, built upon them, and then derived an understanding based on reasoning of the steps I took (often after backtracking from several reasoning dead ends) before providing the explanation.
If you ask an LLM to explain their reasoning, it's unclear whether it just guessed the explanation and reasoning too, or if that was actually the set of steps it took to get to the first answer they gave you. This is why LLMs are able to correct themselves after claiming strawberry has 2 rs, but when providing (guessing again) their explanations they make more "relevant" guesses.
I'm not sure what "just guessed" means here. My experience with LLMs is that their "guesses" are far more reliable than a human's casual guess. And, as you say, they can provide cogent "explanations" of their "reasoning." Again, you say they might be "just guessing" at the explanation, what does that really mean if the explanation is cogent and seems to provide at least a plausible explanation for the behavior? (By the way, I'm sure you know that plenty of people think that human explanations for their behavior are also mere narrative reconstructions.)
I don't have a strong view about whether LLMS are really reasoning -- whatever that might mean. But the point I was responding to is that LLMS have simply memorized all the answers. That is clearly not true under any normal meanings of those words.
You have probably seen examples of LLMs doing the "mirror test", i.e. identifying themselves in screenshots and referring to the screenshot from the first person. That is a genuinely novel question as an "LLM mirror test" wasn't a concept that existed before about a year ago.
yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them?
I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result"
There is some kind of nested multidimensional thing to train on here instead of immediate limited choices
Well yeah... If you only ever played one game in your life you would probably be pretty shit at other games too. This does not seem very revealing to me.
I don't think thats true. If you'd only ever played Doom, I think you could play, say, counterstrike or half-life and be pretty good at it, and i think Carmack is right that its pretty interesting that this doesn't seem to be the case for ai models