|
|
|
|
|
by whimsicalism
753 days ago
|
|
> LLM starts from scratch for each token generated What do you mean? They get to access their previous hidden states in the next greedy decode using attention, it is not simply starting from scratch. They can access exactly what they were thinking when they put out the previous word, not just reasoning from the word itself. |
|