| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lumost 54 days ago
	the latent space of the LLM when it chooses each token is 10s or even hundreds of GB for each word that it chooses. It's not really useful to look at LLMs from the perspective of its prediction head which is a very small part of the model.

2 comments

RaftPeople 50 days ago

Late response, was out of town.

Agreed there is significant information in the latent space, but what is missing is a fully resolved "thought" based on that information plus current context plus validation against an internal working model of the world.

link

MarkusQ 54 days ago

Except that latent space does not change in response to new information, something that thoughts famously do. If you read a book that captures the author's thoughts, disagree, and write an eloquent arguments to the author, you might change the author's mind. But you will not change the "book's thoughts" on the subject.

Latent spaces are maps of thoughts other people have had, not the thoughts themselves.

link

lumost 54 days ago

This gets a bit tricky. Over very long task contexts (1M tokens) or with prompt compression (10s of millions of tokens) the model can alter its priors based on updated evidence. This form of knowledge based learning is not necessarily robust, but demonstrably does occur.

link

MarkusQ 53 days ago

"the model can alter its priors"

The model doesn't have high-level priors in the Bayesian sense (though you could have priors about it).

The low-level priors it does have (the weights) are not modified by the context.

link