| Similar arguments to LeCun. People are going to keep saying this about autoregressive models, how small errors accumulate and can't be corrected, while we literally watch reasoning models say things like "oh that's not right, let me try a different approach". To me, this is like people saying "well NAND gates clearly can't sort things so I don't see how a computer could". Large transformers can clearly learn very complex behavior, and the limits of that are not obvious from their low level building blocks or training paradigms. |
Not saying I disagree with your premise that errors can’t be corrected by using more and more tokens, but this argument is weird to me.
The model isn’t intentionally generating text. The kinds of “oh let me try a different approach” lines I see are often followed by the same approach just taken. I wouldn’t say most of the time, but often enough that I notice.
Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.