Hacker News new | ask | show | jobs
by 8note 595 days ago
There is at least a parameter called Temperature which decides how much randomness to include in the output.

It doesn't get you perfectly deterministic output to set it to 0 though, per https://medium.com/google-cloud/is-a-zero-temperature-determ... as you don't have perfect control over what approximations are being made on your floating point operations

1 comments

The most typical reason argmax (temp 0) is non-deterministic is that your request is running batched with other people requests. The number and size of these affects the matrix sizes and thus tiling decisions. Then you get different floating point order and thus different results.

Nvidia gives some guarantees about deterministic results of their kernels but that only applies when you have exact same input data and this is not the case when in-flight batching.