|
|
|
|
|
by versteegen
773 days ago
|
|
> How does it know what the next street or neighbourhood it should traverse in each step without a pathfinding algo? Because Transformers are 'AI-complete'. Much is made of (decoder-only) transformers being next token predictors which misses the truth that large transformers can "think" before they speak: there are many layers in-between input and output. They can form a primitive high-level plan by a certain layer of a certain token such as the last input token of the prompt, e.g. go from A to B via approximate midpoint C, and then refer back to that on every following token, while expanding upon it with details (A to C via D): their working memory grows with the number of input+output tokens, and with each additional layer they can elaborate details of an earlier representation such as a 'plan'. However the number of sequential steps of any internal computation (not 'saved' as an output token) is limited by the number of layers. This limit can be worked around by using chain-of-thought, which is why I call them AI-complete. I write this all hypothetically, not based on mechanistic interpretability experiments. |
|