| > so I can't just store my prompts in git and know that I'll get the same behavior each time. Yes you can, albeit it's pretty silly to do so. LLMs are not (inherently) nondeterministic, you're just not pinning the seed, or are using a remote, managed service for them with no reliability and consistency guarantees. [0] Here, experiment with me [1]: - download Ollama 0.9.3 (current latest) - download the model gemma3n e4b (digest: 15cb39fd9394) using the command "ollama run gemma3n:e4b" - pin the seed to some constant; let's use 42 as an example, by issuing the following command in the ollama interactive session started in the previous step: "/set parameter seed 42" - prompt the model: "were the original macs really exactly 9 inch?" It will respond with: > You're right to question that! The original Macintosh (released in 1984) *was not exactly 9 inches*. It was
marketed as having a *9-inch CRT display*, but the actual usable screen size was a bit smaller. (...) Full response here: https://pastebin.com/PvFc4yH7 The response should be the same over time, across all devices, regardless of whether GPU-acceleration is available. Bit of an aside, but the overall sentiment echoed in the article reminds me to how visual programming was going to revolutionize everything and take programmers' jobs. With the exception that AI I find actually useful, and was able to integrate it into my workflow. [0] All of this is to say, to the extent LLM nondeterminism is currently a model trait, it is substituted using a PRNG. Actual nondeterminism is at most an inference engine trait typically instead, see e.g. batched inference. [1] Details are for experiment reproduction purposes. You can substitute the listed inference engine, model, seed, and prompt with whatever your prefer for your own set of experiments. |
Strangely, I'm only getting 2 alternating results every time I restart the model. I was not able to get the same result as you and certainly not with links to external sources. Is there anything else I could do to try to replicate your result?
I've only used ChatGPT prior and it'd be nice to use locally run models with consistent results.