|
|
|
|
|
by lumost
54 days ago
|
|
the latent space of the LLM when it chooses each token is 10s or even hundreds of GB for each word that it chooses. It's not really useful to look at LLMs from the perspective of its prediction head which is a very small part of the model. |
|
Agreed there is significant information in the latent space, but what is missing is a fully resolved "thought" based on that information plus current context plus validation against an internal working model of the world.