Hacker News new | ask | show | jobs
by TacticalCoder 900 days ago
Are these LLMs you can run locally giving answers deterministically just as with, say, StableDiffusion? In StableDiffusion if you reuse the exact same version of SD / model and same query and seed, you always get the same result (at least I think so).
2 comments

Even with Stable Diffusion, determinism is “best effort”- there are flags you can set in Torch to make it more deterministic at a performance cost, but it’s explicitly disclaimed:

https://pytorch.org/docs/stable/notes/randomness.html

The base models of stablediffusion were always deterministic if you use a deterministic noise scheduler...
I think they’re referring to CUDA (and possibly other similar runtimes) being able to schedule floating point ops non-deterministically, combined with floating point arithmetic being potentially non-associative. I’m not personally sure how big an issue that would be for the output though.
I have never spotted any difference when regenerating (a recent) image with the same settings/seed/noise and I do it often. Haven't compared the bits though.

Older images are often difficult to reproduce for me - I believe due to changes in tooling (mostly updating Auto1111).

Differences in output are generally varying levels of difficulty of “spot the difference” and rarely changes the overall image composition by much. I always use nondeterministic algos and it doesn’t have any affect on my ability to refine prompts effectively.
Yeah this is what I was referring to: GPU/FP issue which, btw, had been explained to me in the past here on HN...
Yes, you can set the temperature to 0, then they should be deterministic.
Someone mentions temperature in the context of algorithms, can't stop thinking, cool, simulated annealing. Haven't seen temperature used in any other family of algo before this.
If you squint, it’s the same thing. Simulated annealing generally attempts to sample from the Boltzmann distribution. (Presumably because actual annealing is a thermodynamic thing, and you can often think of annealing in a way that the system is a sample from the Boltzmann distribution.)

And softmax is exactly the function that maps energies into the corresponding normalized probabilities under the Boltzmann distribution. And transformers are generally treated as modeling the probabilities of strings, and those probabilities are expressed as energies under the Boltzmann distribution (i.e., logits are on a log scale), and asking your favorite model a question works by sampling from the Boltzmann distribution based on the energies (log probabilities) the model predicts, and you can sample that distribution at any temperature you like.

I'm interested, how does LLM temperature relate to simulated annealing?