|
|
|
|
|
by hansvm
1154 days ago
|
|
> 6 Selecting the likeliest token is only one of many sampling options, and it's extremely poor for most tasks, moreso when you consider the relationships between multiple executions of the model. _Some_ (not necessarily softmax) probability renormalization trained into the model is issential for a lot of techniques. |
|
The idea is that this is more general than eg changing the temperature of the softmax, or using top-k where you just keep the k most probable outcomes.
Note that if you do Nucleus sampling (aka top-p) with the threshold p=0% you just pick the maximum likelihood estimate.