|
|
|
|
|
by cjbillington
847 days ago
|
|
Pretty much. The model outputs a number for each possible token, but rather than just picking the token with the biggest number, each number x is fed to exp(x/T) and then the resulting values are treated as proportional to probabilities. A random token is then chosen according to said probabilities. In the limit of T going to 0, this corresponds to always choosing the token for which the model output the largest value (making the output deterministic). In the limit of T going to infinity, it corresponds to each token being equally likely to be chosen, which would be gibberish. |
|