| HN Mirror

Sorry, I meant the information that is inferred (from scratch on every token) from the entire context, and is then reduced to that single token. Every time a token is generated, the LLM looks at the entire context, does some processing (and critically, this step generates new data that is inferred from the context) and then the result of all that processing is reduced to a single token.

My conjecture is that the LLM "knows" some things that it does not put into words. I don't know what it is, but it seems wasteful to drop the entire state on every token. I even suspect that there is something like a "single logic step" of some conclusions from the context. Though I may be committing the fallacy of thinking in symbolic terms of something that is ultimately statistical.