Y
Hacker News
new
|
ask
|
show
|
jobs
by
throwawayk7h
814 days ago
The next token is taken by sampling the logits in the final column after unembedding. But isn't that just the last token again? Or is the matrix resized to N+1 at some step?
1 comments
HarHarVeryFunny
814 days ago
There is an end-of-sequence token appended to the input sequence, and this is what is transformed into the predicted next token.
link