|
|
|
|
|
by famouswaffles
848 days ago
|
|
I just found this paper i read a while ago. Doesn't this answer the question ? The Impact of Reasoning Step Length on Large Language Models - https://arxiv.org/abs/2401.04925 >They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding. More tokens is more compute time for the model to utilize, that is completely true. What they guess is that the model can utilize the extra compute for better predictions even if there's no extra information to accompany this extra "thinking time". |
|
This is completely orthogonal to CoT, which is simply a better prompt - it probably causes some sort of better pattern matching (again very poorly understood).