| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tjbai 558 days ago

The last hidden state is just the output embedding after N residual layers, e.g. input embedding + res1 + res2 + ...

There's typically an "unembedding layer"/"classification head" that uses this hidden state to produce a softmax distribution over the LLM's vocabulary. In this case, we can think of this as "snapping" the hidden state into a single token and feeding that token into the next position of the autoregressive LLM.

In this sense, the last hidden state _does_ augment the next input. The authors simply propose directly feeding this hidden state into the next step rather than reducing it into a single token—thus, reasoning in continuous latent space rather than discrete token space.

2 comments

intalentive 558 days ago

Moreover “snapping” the hidden state to a token is akin to quantization. It’s lossy. By staying in latent space the model can “reason” at “full resolution” without discretization noise.

link

snthpy 558 days ago

Sometimes discretization introduces interesting behavior though. Compare for example the logistic map and it's chaotic regime with the simplicity of the logistic ODE. Another example would be quantum mechanics compared to classical mechanics and determinism. The Poincare Conjecture was only interesting for n=3 due to too much connectivity in higher dimensions. Wouldn't it be interesting if consciousness only arose in such a discretized form, a case of incidental complexity and chaos introduced as the result of topological non-triviality from quantization?

Don't forget, non-linearity is fundamental to the whole process, otherwise you'd just have one large linear transformation. Maybe there's a similar role for discretization? :shrug:

link

soulofmischief 558 days ago

Useful information about conceptual relationships and procedure can be captured in the LM head, so there is also potential lossiness when short-circuiting it.

link

sweetheart 558 days ago

Wow this was the explanation that made it all click for me. Thanks so much!

link