|
|
|
|
|
by thaanpaa
64 days ago
|
|
Well, the fun part is that the algorithms themselves are deterministic. They are just so afraid of model distillation that they force some randomness on top (and now hide thinking). Arguably for coding, you'd probably want temperature=0, and any variation would be dependent on token input alone. |
|
As for distillation... sampling from the temp 1 distribution makes it easier.