| HN Mirror

Analogies are all messy here, but I would compare the values of the residual stream to what you are describing as thought.

We force this residual stream to project to the logprobs of all tokens, just as a human in the act of speaking a sentence is forced to produce words. But could this residual stream represent thoughts which don't map to words?

Its plausible, we already have evidence that things like glitch-token representations trend towards the centroid of the high-dimensional latent space, and logprobs for tokens that represent wildly-branching trajectories in output space (i.e. "but" vs "exactly" for specific questions) represent a kind of cautious uncertainty.