| HN Mirror

That's not strictly correct. All LLMs output logits softmax'd into a probability distribution of the next token, and this distribution is indeed deterministic.

Most generative AI apps set a nonzero temperature which scales the probability. So if you have a distribution with 50%, 30%, 20% for tokens, and a temperature of 1, then you'd up to 3 different outputs sampled at those exact probabilities, which iteratively cascade into completely different texts. The RNG of the probability selections can be controlled by a seed but with distributed systems that is often not the case: I've only seen seeds returned for cases where the entire model is on a single system. Otherwise, just not using a seed is fine for sufficient randomness.

If the temperature is 0, then it instead chooses the token with the highest probability, and done iteratively the final output will be the same. (this is not accounting for distributed system weirdness)