Hacker News new | ask | show | jobs
by layer8 372 days ago
Temperature > 0 isn’t a problem as long as you can specify/save the random seed and everything else is deterministic. Of course, “as long as” is still a tall order here.
2 comments

My understanding is that the implementation of modern hosted LLMs is nondeterministic even with known seed because the generated results are sensitive to a number of other factors including, but not limited to, other prompts running in the same batch.
Gemini, for example, launched implicit caching on or about 2025-05-08: https://developers.googleblog.com/en/gemini-2-5-models-now-s... :

> Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount.

> In order to increase the chance that your request contains a cache hit, you should keep the content at the beginning of the request the same and add things like a user's question or other additional context that might change from request to request at the end of the prompt.

From https://news.ycombinator.com/item?id=43939774 re: same:

> Does this make it appear that the LLM's responses converge on one answer when actually it's just caching?

Have any of the major hosted LLMs ever shared the temperature parameters that prompts were generated with?