|
|
|
|
|
by didibus
303 days ago
|
|
You seem possibly more knowledgeable then me on the matter. My impression is that LLMs predict the next token based on the prior context. They do that by having learned a probability distribution from tokens -> next-token. Then as I understand, the models are never reasoning about the problem, but always about what the next token should be given the context. The chain of thought is just rewarding them so that the next token isn't predicting the token of the final answer directly, but instead predicting the token of the reasoning to the solution. Since human language in the dataset contains text that describes many concepts and offers many solutions to problems. It turns out that predicting the text that describes the solution to a problem often ends up being the correct solution to the problem. That this was true was kind of a lucky accident and is where all the "intelligence" comes from. |
|