|
|
|
|
|
by throwdbaaway
215 days ago
|
|
It is the reasoning. During the reasoning process, the top few tokens have very similar or even same logprobs. With gpt-oss-120b, you should be able to get deterministic output by turning off reasoning, e.g. by appending: {"role": "assistant", "content": "<think></think>"}
Of course, the model will be less capable without reasoning. |
|
Also the mistral medium model we tested had ~70% deterministic outputs across the 16 runs for the text to sql gen and summarization in json tasks- and it had reasoning on. Llama 3.3 70b started to degrade and doesn’t have reasoning. But it’s a relevant variable to consider