|
|
|
|
|
by apstroll
527 days ago
|
|
The output distribution is deterministic, the output token is sampled from the output distribution, and is therefore not deterministic.
Temperature modulates the output distribution, but sitting it to 0 (i.e. argmax sampling) is not the norm. |
|
LLMs are basically "deterministic" when using greedy sampling except for either MoE related shenanigans (what historically prevented determinism in ChatGPT) or due to floating point related issues (GPU related). In practice, LLMs are in fact basically "deterministic" except for the sampling/temperature stuff that we add at the very end.