|
|
|
|
|
by dnautics
131 days ago
|
|
"LLMs cannot backtrack". This is exactly wrong. LLMs always see everything in the past. In this sense they are more efficient than turing machines, because (assuming sufficiently large context length) every token sees ALL previous tokens. So, in principle, an LLM could write a bunch of exploratory shit, and then add a "tombstone" "token" that can selectively devalue things within a certain timeframe -- aka just de exploratory thngs (as judged by RoPE time), and thus "backtrack". I put "token" in quotes because this would obviously not necessarily be an explicit token, but it would have to be learned group of tokens, for example. But who knows, if the thinking models have some weird pseudo-xml delimiters for thinking, it's not crazy to think that an LLM could shove this information in say the closer tag. |
|
If it wasn't clear, I am talking about LLMs in use today, not ultimate capabilities. All commercial models are known (or believed) to be recursively applied transformers without e.g. backspace or "tombstone" tokens, like you are mentioning here.
But yes, absolutely LLMs might someday be able to backtrack, either literally during token generation if we allow e.g. backspace tokens (there was at least one paper that did this) or more broadly at the chain of thought level, with methods like you are mentioning.