|
|
|
|
|
by famouswaffles
1127 days ago
|
|
Variants of common riddles actually can be solved with GPT-4, but you have to rewrite it so it doesn't look like the riddle from memory(sometimes, it's as easy as changing names to something completely different). Turns out Language models trust their memory quite a bit.
Slightly related - they won't actually use the results of tools if it differs a lot from what it expects the output to be - https://vgel.me/posts/tools-not-needed/ |
|
All they have is memory, either in the weights or the input prompt. To the extent that these models appear to reason, it is precisely in the ability to successfully substitute information from the prompt into reasoning patterns in the training data. It shouldn't be any surprise that this fails when patterns in the prompt strongly condition the model to reproduce particular patterns of reasoning (eg, many words in the riddle indicate a well known riddle, but the details are different).
I know the impulse to anthropomorphize is almost impossibly seductive, but I find that the best way to understand and use these models is to remember: they are giant conditional probability distributions for the next token.