|
|
|
|
|
by lopuhin
157 days ago
|
|
On whether this accounts only the final output layer -- once the first token is generated (i.e. selected according to the modified sampling procedure), and assuming a different token is selected compared to standard sampling, then all layers of the model would be affected during generation of subsequent tokens. |
|