Varying the deployment type (chip model, number of chips, batch size, ...) can also change the output due to rounding errors. See https://arxiv.org/abs/2506.09501 for some details on that.
The two parts of your statement don't go together. A list of potential output tokens and their probabilities are generated deterministically but the actual token returned is then chosen at random (weighted based on the "temperature" parameter and the probability value).
I assume they use software-based pseudo-random-number generators. Those can typically be given a seed-value which determines (deterministically) the sequence of random numbers that will be generated.
So if an LLM uses a seedable pseudo-random-number-generator for its random numbers, then it can be fully deterministic.
There are subtle sources of nondeterminism in concurrent floating point operations, especially on GPU. So even with a fixed seed, if an LLM encounters two tokens with very close likelihoods, it may pick one or the other across different runs. This has been observed even with temperature=0, which in principle does not involve _any_ randomness (see arXiv paper cited earlier in this thread).
Varying the deployment type (chip model, number of chips, batch size, ...) can also change the output due to rounding errors. See https://arxiv.org/abs/2506.09501 for some details on that.