Hacker News new | ask | show | jobs
by cjbillington 847 days ago
Pretty much.

The model outputs a number for each possible token, but rather than just picking the token with the biggest number, each number x is fed to exp(x/T) and then the resulting values are treated as proportional to probabilities. A random token is then chosen according to said probabilities.

In the limit of T going to 0, this corresponds to always choosing the token for which the model output the largest value (making the output deterministic). In the limit of T going to infinity, it corresponds to each token being equally likely to be chosen, which would be gibberish.