|
|
|
|
|
by alew1
518 days ago
|
|
"Temperature" doesn't make sense unless your model is predicting a distribution. You can't "temperature sample" a calculator, for instance. The output of the LLM is a predictive distribution over the next token; this is the formulation you will see in every paper on LLMs. It's true that you can do various things with that distribution other than sampling it: you can compute its entropy, you can find its mode (argmax), etc., but the type signature of the LLM itself is `prompt -> probability distribution over next tokens`. |
|
Zero temperature => fully deterministic
The neuron activation levels do not inherently form or represent a probability distribution. That's something we've slapped on after the fact