|
|
|
|
|
by irthomasthomas
721 days ago
|
|
Temperature 0 will not prevent randomness, only reduced it.
I addition, there may be times when temperature > 0 is essential for reproducing the text accurately. Consider a model with a knowledge cutoff 3--6 months out of date and trying to write e.g. a model name which did not exist when the model was trained. In that case temperature 0 will make it more likely to fix your code by replacing the model name it's never heard of with one more likely according to the model training data. In other words, if the text you want was not in the model training data, a higher than normal temperature may be required, depending on how frequently the term appears in the input data. If you provide a few samples in the input, then you may be able to use 0 again. |
|
The problem with generating structured output like JSON is that temperature > 0 also increases the likelihood of a token belonging to the set of “wrong” answers being chosen. With prose that’s not the end of the world because subsequent tokens can change the meaning. But with JSON or code, the wrong token in the wrong place can make the output invalid: it’s no longer parseable json or compilable code. In the blog they were also generating bools in one spot, and temp > 0 would probably result in the “wrong” answer being chosen sometimes.
For that reason I’d suggest generating JSON fields independently and then create the full JSON object from those outputs the old fashioned way. That way different fields can use different temperature settings. You’d probably want temperature=0 for generating bools/enums/very short answers like “New York”, and temperature > 0 for prose text like summaries or descriptions.