Hacker News new | ask | show | jobs
by HarHarVeryFunny 810 days ago
There is an end-of-sequence token appended to the input sequence, and this is what is transformed into the predicted next token.