|
|
|
|
|
by azinman2
1173 days ago
|
|
But ultimately it is predicting the next token. That's the taste. Using context from what's already been predicted, what comes before it, attention mechanisms to know how words relate, all of the intermediate embeddings and whatever they signify about the world -- that all just makes the next word prediction that much better. |
|