|
|
|
|
|
by p1esk
848 days ago
|
|
This paper is a great illustration of how little is understood about this question. They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding. But in any case, this phenomenon has little to do with increasing the size of the prompt using meaningful tokens. We still have no clue if it helps or not. |
|
The Impact of Reasoning Step Length on Large Language Models - https://arxiv.org/abs/2401.04925
>They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding.
More tokens is more compute time for the model to utilize, that is completely true.
What they guess is that the model can utilize the extra compute for better predictions even if there's no extra information to accompany this extra "thinking time".