| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ajwin 595 days ago
	Do LLM's always pick the most probable next word? I would have thought this would lead to having the same output for every input? How does this deal with the randomness that you get from prompting the same thing over and over?

2 comments

8note 595 days ago

There is at least a parameter called Temperature which decides how much randomness to include in the output.

It doesn't get you perfectly deterministic output to set it to 0 though, per https://medium.com/google-cloud/is-a-zero-temperature-determ... as you don't have perfect control over what approximations are being made on your floating point operations

link

mmoskal 595 days ago

The most typical reason argmax (temp 0) is non-deterministic is that your request is running batched with other people requests. The number and size of these affects the matrix sizes and thus tiling decisions. Then you get different floating point order and thus different results.

Nvidia gives some guarantees about deterministic results of their kernels but that only applies when you have exact same input data and this is not the case when in-flight batching.

link

janalsncm 595 days ago

It depends. If we use beam search we pick the most likely sequence of tokens rather than the most likely token at each point in time. This process is deterministic though.

We can also sample from the distribution, which introduces randomness. Basically, if word1 should be chosen 75% of the time and word2 25% of the time, it will do that.

The randomness you’re seeing can also be due to implementation details.

https://community.openai.com/t/a-question-on-determinism/818...

link