Even this seems too grand a claim. I’d water it down thus: the LLM encodes that the token(s) for “Spot” are probabilistically plausible in the ensuing output.
The model is enormous, and N-dimensional for very high N. But the model remains insufficiently enormous for understanding, and moreover, the model cannot observe itself and adjust.
Ask an LLM to extrapolate, see any semblance of reason collapse.
You can't be suggesting it has memorized relationships between all concepts, the model would be enormous.
So clearly, there is something else going on. It's able to encode concepts/ideas.