|
|
|
|
|
by causal
762 days ago
|
|
So most of my understanding comes from this series, particularly the last two videos: https://www.3blue1brown.com/topics/neural-networks Essentially each token of a text occupies a point in a many-dimensional model that represents meaning, and LLMs predict the next token by modifying the last token with the context of all the tokens before it. Attention heads are basically a way of choosing which prior tokens are most relevant and adjusting the last token's point in vector-space accordingly. |
|