|
|
|
|
|
by kelseyfrog
816 days ago
|
|
There's a recent paper which seeks to explicitly perform time-to-think using pause tokens[1]. > However sophisticated this end-to-end process may be, it abides by a
peculiar constraint: the number of operations determining the next token is limited by the number of tokens seen so far. There are obviously pros and cons to each, but nothing excludes us from combining the two either. 1. Think before you speak: Training Language Models With Pause Tokens https://arxiv.org/abs/2310.02226v2 |
|