|
|
|
|
|
by croon
113 days ago
|
|
Isn't that why noise was introduced (seed rolling/temperature/high p/low p/etc)? I mean it is still deterministic given the same parameters. But this might be misleadingly interpreted as an LLM having "thought out an answer" before generating tokens, which is an incorrect conclusion. Not suggesting you did. |
|
I'm convinced that that is exactly what happens. Anthropic confirms it:
"Claude will plan what it will say many words ahead, and write to get to that destination. We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes the next line to get there. This is powerful evidence that even though models are trained to output one word at a time, they may think on much longer horizons to do so."
https://www.anthropic.com/research/tracing-thoughts-language...