Hacker News new | ask | show | jobs
by reqo 854 days ago
Isn’t that what the softmax layer is doing? The token with highest probability among all the available tokens in the model dictionary is chosen as the next token!
1 comments

no. Softmax layer produces a distribution. What you do with that is up to you. There are numerous ways to choose from that distribution.