|
|
|
|
|
by valine
386 days ago
|
|
The lower dimensional logits are discarded, the original high dimensional latents are not. But yeah, the LLM doesn’t even know the sampler exists. I used the last layer as an example, but it’s likely that reasoning traces exist in the latent space of every layer not just the final one, with the most complex reasoning concentrated in the middle layers. |
|