| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by diggan 397 days ago
	+ temperature=0.0 would be needed for reproducible outputs. And even with that, if it's actually reproducible or not depends on the model/weights themselves, not all of them are even when all those things are static. And then finally depends on the implementation of the model architecture as well. I think the tricky part is that we tend to think that prompts with similar semantic meaning will give the same outputs (like a human), while LLMs can give vastly different outputs if you have one spelling mistake for example, or used "!" instead of "?", the effect varies greatly per model.

2 comments

dcminter 397 days ago

Hmm, I'm barely even a dabbler, but I'd assumed that the seed in question drove the (pseudo)randomness inherent in "temperature" - if not, what seed(s) do they use and why could one not set that/those too?

To your second part I wouldn't make that assumption - I can see how a non-technical person might, but surely programmers wouldn't? I've certainly produced very different output from that which I intended in boring old C with a mis-placed semi-colon after all!

link

diggan 397 days ago

> Hmm, I'm barely even a dabbler, but I'd assumed that the seed in question drove the (pseudo)randomness inherent in "temperature" - if not, what seed(s) do they use and why could one not set that/those too?

Implementations and architectures are different enough that it's hard to say "It's like X" in all cases. Last time I tried to achieve 100% reproducible outputs, which obviously includes hard-coding various seeds, I remember not getting reproducible outputs unless setting temperature to 0, I think this was with Qwen2 or Qwq used via Huggingface's Transformers library, but cannot find the exact details now.

Then in other cases, like the hosted OpenAI models, they straight up say "temperature to 0 makes them mostly deterministic", but I'm not exactly sure why they are unable to offer endpoints with determinism.

> I can see how a non-technical person might, but surely programmers wouldn't?

When talking even with developers about prompting and LLMs, there is still quite a few people who are surprised that "You are a helpful assistant." would lead to different outputs than "You are a helpful assistant!". I think if you're a programmer or not matters less, more about understanding how the LLMs actually work in order to understand that.

link

dcminter 397 days ago

Oh, well that's super interesting, thanks; I guess some side effect of the high degree of parallelism? Anyway, I guess I need to do a bit more than dabble.

> I think if you're a programmer or not matters less, more about understanding how the LLMs actually work in order to understand that.

Sounds like I need to understand them better then as I merely had different misaprehensions than those. More reading for me...

link

smokel 397 days ago

> I think the tricky part is that we tend to think that prompts with similar semantic meaning will give the same outputs (like a human)

Trust me, this response would have been totally different if I were in a different mood.

link