|
|
|
|
|
by nextaccountic
816 days ago
|
|
This begs the question: why is it that giving them more time to "think" yields better answers, and is there any limit to that? If I make them write hundreds of pages of explanation, there must be a diminishing returns of some kind. What influences the optimal amount of thinking? My guess is that good answers are more well reasoned than answers that are short and to the point, and this is picked up in training or fine-tuning or some other step. And probably the optimal amount of thinking has something to do with the training set or the size of the network (wild guesses). |
|
This explains why GPT4 cannot accurately perform large number multiplication and decimal exponentiation. [0]
This example can extend to general natural language generation. While some answers can be immediately retrieved or generated by a "cache" / algorithm which exists in latent space, some tokens have better quality when their latent-space algorithm is executed in multiple steps.
[0] https://www.semanticscholar.org/reader/817e52b815560f95171d8...