| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by TacticalCoder 900 days ago
	Are these LLMs you can run locally giving answers deterministically just as with, say, StableDiffusion? In StableDiffusion if you reuse the exact same version of SD / model and same query and seed, you always get the same result (at least I think so).

2 comments

JimDabell 900 days ago

Even with Stable Diffusion, determinism is “best effort”- there are flags you can set in Torch to make it more deterministic at a performance cost, but it’s explicitly disclaimed:

https://pytorch.org/docs/stable/notes/randomness.html

link

Zetobal 900 days ago

The base models of stablediffusion were always deterministic if you use a deterministic noise scheduler...

link

pizza 900 days ago

I think they’re referring to CUDA (and possibly other similar runtimes) being able to schedule floating point ops non-deterministically, combined with floating point arithmetic being potentially non-associative. I’m not personally sure how big an issue that would be for the output though.

link

yreg 900 days ago

I have never spotted any difference when regenerating (a recent) image with the same settings/seed/noise and I do it often. Haven't compared the bits though.

Older images are often difficult to reproduce for me - I believe due to changes in tooling (mostly updating Auto1111).

link

Our_Benefactors 900 days ago

Differences in output are generally varying levels of difficulty of “spot the difference” and rarely changes the overall image composition by much. I always use nondeterministic algos and it doesn’t have any affect on my ability to refine prompts effectively.

link

TacticalCoder 899 days ago

Yeah this is what I was referring to: GPU/FP issue which, btw, had been explained to me in the past here on HN...

link

tionis 900 days ago

Yes, you can set the temperature to 0, then they should be deterministic.

link

dilawar 900 days ago

Someone mentions temperature in the context of algorithms, can't stop thinking, cool, simulated annealing. Haven't seen temperature used in any other family of algo before this.

link

amluto 900 days ago

If you squint, it’s the same thing. Simulated annealing generally attempts to sample from the Boltzmann distribution. (Presumably because actual annealing is a thermodynamic thing, and you can often think of annealing in a way that the system is a sample from the Boltzmann distribution.)

And softmax is exactly the function that maps energies into the corresponding normalized probabilities under the Boltzmann distribution. And transformers are generally treated as modeling the probabilities of strings, and those probabilities are expressed as energies under the Boltzmann distribution (i.e., logits are on a log scale), and asking your favorite model a question works by sampling from the Boltzmann distribution based on the energies (log probabilities) the model predicts, and you can sample that distribution at any temperature you like.

link

potatoman22 900 days ago

I'm interested, how does LLM temperature relate to simulated annealing?

link