| HN Mirror

> But, if we allowed him to produce some tokens in silence prior to answering

Depending on how the model is implemented this is already the case. Transformers just predict the next token but usually we don't just greedily pick the most likely next token as doing this produces cases where the model just repeats the same sentence or spams tokens it really likes (the enter key). Some more sophisticated techniques, like beam search, produce a different sequences of tokens and try to maximise the score across all tokens in the sequence.