|
|
|
|
|
by wnmurphy
816 days ago
|
|
I think it's fairly simple: you're creating space for intermediary tokens to be generated, where those intermediary tokens represent "thoughts" or a simulated internal dialog. Without that, it's analogous to asking someone a question and they immediately start responding from some information they'd heard before, rather than taking some time to have an inner dialog with themself. |
|
> However sophisticated this end-to-end process may be, it abides by a peculiar constraint: the number of operations determining the next token is limited by the number of tokens seen so far.
There are obviously pros and cons to each, but nothing excludes us from combining the two either.
1. Think before you speak: Training Language Models With Pause Tokens https://arxiv.org/abs/2310.02226v2