Hacker News new | ask | show | jobs
by jll29 784 days ago
Keep in mind:

1. LLMs use random numbers internally, something that can be controlled via the 'temperature' parameter. temperature=0 means no random behavior (however this is also a broadly known fact that this is not fully correctly implemented in many LLMs), but instead always the most likely answer will be given, deterministically.

2. Note also that LLMs have no memory; the 'appearance' of memory is an illusion created by feeding the LLM the whole history of the chat with each new user utterance!

1 comments

1. Incorrect. The output of the decoder LLM is the probability distribution of the next token given the input text. Temperature=0 means that the output distribution is not pushed to be closer to a uniform distribution. The randomness comes from the sampling of the next token according to the output distribution to generate text. If you want determinism you always get the argmax of the distribution.
Incorrect. The output of the decoder LLM is logits that are then divided by the temperature and passed through softmax to give the probabilities. You can't actually set temperature to 0 (division by zero), but in the limit where temperature approaches 0, softmax converges to argmax.

Temperature = 1 is where it's not pushed in either direction.