hopefully i am not sounding too pedantic in mentioning this. But LLMS are still deterministic if you're using the same prompt and seed , temp , (sometimes requires the same hardware even) etc.
Are they? AFAIK, the “etc” includes using hardware that produces the same results for a given input every time. Once you start to multi-thread/multi-process in combination with floating point math, that can be hard to accomplish.
For example, the result of summing a stream of floats depends on the order the floats arrive in, and that order can change depending on what’s in your CPU cache when you start a computation, on whether something else running on your system such as a timer interrupt evicts something from cache during a computation, etc.
If you’re running on your GPU, even if the behavior of your GPU is 100% predictable (I wouldn’t know of that’s true on modern hardware, but my guess is it isn’t) anything that also uses the GPU can change things.
For example, the result of summing a stream of floats depends on the order the floats arrive in, and that order can change depending on what’s in your CPU cache when you start a computation, on whether something else running on your system such as a timer interrupt evicts something from cache during a computation, etc.
If you’re running on your GPU, even if the behavior of your GPU is 100% predictable (I wouldn’t know of that’s true on modern hardware, but my guess is it isn’t) anything that also uses the GPU can change things.