| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by phyalow 352 days ago
	What? You can get consistent output on local models. I can train large nets deterministically too (CUBLAS flags). What your saying isn't true in practice. Hell I can also go on the anthropic API right now and get verbatim static results.

2 comments

simonw 351 days ago

"Hell I can also go on the anthropic API right now and get verbatim static results."

How?

Setting temperature to 0 won't guarantee the exact same output for the exact same input, because - as the previous commenter said - floating point arithmetic is non-commutative, which becomes important when you are running parallel operations on GPUs.

link

sva_ 350 days ago

Shouldn't it be the fact that they're non-associative? Because the reduction kernels will combine partial results (like the dot‑products in a GEMM or the sum across attention heads) in a way that the order of operations may change (non-associative), which can lead to the individual floats to be round off differently.

link

oxidi 351 days ago

I think lots of people misunderstand that the "non-deterministic" nature of LLMs come from sampling the token distribution, not from the model itself.

link

simonw 351 days ago

It's also the way the model runs. Setting temperature to zero and picking a fixed seed would ideally result in deterministic output from the sampler, but in parallel execution of matrix arithmetic (eg using a GPU) the order of floating point operations starts to matter, so timing differences can produce different results.

link

oxidi 351 days ago

Good point. Though sampling generally happens on the CPU in a linear way. What you describe might influence the raw output logits from a single LLM step, but since the differences are only tiny, a well designed sampler could still make the output deterministic (so same seed = same text output). With a very high temperature these small differences might influence the output though, since the ranking of two tokens might be swapped.

I think the usual misconception is to think that LLM outputs are random "by default". IMHO this apparent randomness is more of a feature rather than a bug, but that may be a different conversation.

link