Hacker News new | ask | show | jobs
by throwawayk7h 814 days ago
The next token is taken by sampling the logits in the final column after unembedding. But isn't that just the last token again? Or is the matrix resized to N+1 at some step?
1 comments

There is an end-of-sequence token appended to the input sequence, and this is what is transformed into the predicted next token.