|
|
|
|
|
by pgorczak
816 days ago
|
|
I don’t think commonly used LLM architectures have internal state that carries over between inference steps, so shouldn’t that be none? Unless you mean the previously generated tokens up to the context limit which is well defined. |
|
My conjecture is that the LLM "knows" some things that it does not put into words. I don't know what it is, but it seems wasteful to drop the entire state on every token. I even suspect that there is something like a "single logic step" of some conclusions from the context. Though I may be committing the fallacy of thinking in symbolic terms of something that is ultimately statistical.