|
|
|
|
|
by yukIttEft
6 days ago
|
|
> so the model figures out during training what each token should look for and what it should offer But how does it learn this token-relationship? All it has is many text samples, but still, nowhere it says how the tokens relate to each other, so where does this information come from? |
|
The model could just as well learn to predict next token from gibberish text as long as there were some statistical gibberish regularities to learn. However, if you train it on real meaningful text then the statistical regularities it needs to learn (and will, thanks to gradient descent, and the capable architecture) will be those reflecting "token relationships" - grammar, semantics, etc.
So, you can say the "token relationships" (incl word meanings) are reflected in the statistical regularities of the training data, and the model architecture and training algorithm are just very capable of learning those regularities whatever they may be.
You can consider it related to Word2Vec word embeddings, which are based on the idea that the meaning of words comes from how they are used, which to a first approximation can be implemented by considering the meaning of words to be defined by the words they appear next to(!), which is what the Word2Vec embedding training algorithm does, and famous examples such as "(king - man) + woman = queen" prove that this is in fact learning the meanings of words.