|
|
|
|
|
by minimaltom
163 days ago
|
|
Analogies are all messy here, but I would compare the values of the residual stream to what you are describing as thought. We force this residual stream to project to the logprobs of all tokens, just as a human in the act of speaking a sentence is forced to produce words. But could this residual stream represent thoughts which don't map to words? Its plausible, we already have evidence that things like glitch-token representations trend towards the centroid of the high-dimensional latent space, and logprobs for tokens that represent wildly-branching trajectories in output space (i.e. "but" vs "exactly" for specific questions) represent a kind of cautious uncertainty. |
|