Hacker News new | ask | show | jobs
by FabHK 1103 days ago
My understanding is the opposite. The entire process results in a "score" over all output tokens, which is then converted into a probability of being picked, using a softmax that takes a temperature as a parameter. With a temperature of zero, the "best" token is always picked, but interestingly enough, that does not give optimal results. So sometimes you want the second or even third best. Thus, a "good" (GPT-like) LLM is intrinsically random.

To put it differently: You can make them deterministic by using a temperature of zero (then the output would be pretty bad and repetitive), or having a "better" temperature and fixing a random seed (then the output would be better, but it would only be deterministic in the same sense as a simulation of Brownian motion with fixed random seed).

https://ai.stackexchange.com/questions/32477/what-is-the-tem...

Section 3.3 in https://www.lesswrong.com/posts/pHPmMGEMYefk9jLeh/llm-basics...

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

1 comments

So the randomness is not mandatory for the llm to work, it is just boring. This means that as a language model it still performs perfectly well in modeling language. We just give it some random saltyness for fun

I would guess the random step is not even mandatory: there is probably a way to replace randomness with a simplified function and still get interesting text. I can't run a simulation but there is no indication here that good randomness is needed.

Fundamentally the design of the transformer and especially its core which is attention based, does not require randomness, so to call it a stochastic model is a stretch