|
|
|
|
|
by Jensson
302 days ago
|
|
> Why would it not be possible for a highly generalizing model to use next token prediction for its output? The issue is that it uses next token prediction for its training, it doesn't matter how it outputs things but it matters how its trained. As long as these models are trained to be next token predictors you will always be able to find flaws with it that are related to it being a next token predictor, so understanding that is how they work really makes them much easier to use. So since it is so easy to get the model to make errors due to it being trained to just predict tokens people argue that is proof they aren't really thinking. Like, any extremely common piece of text when altered slightly will typically still output the same follow-up as the text it has seen millions of times even though it makes no logical sense. That is due to them being next token predictors instead of reasoning machines. You might say its unfair to abuse their weaknesses as next token predictors, but then you admit that being a next token predictor interferes with their ability to reason, which was the argument you said you don't understand. |
|
LLM research is trying out a lot of different things that move away from just training on next token prediction, and I buy the argument that not doing anything else would be limiting.
The model is still fundamentally a next token predictor.