Hacker News new | ask | show | jobs
by seydor 1107 days ago
LLMs are not stochastic though, they are deterministic and dont even require random numbers, right?

The term in general seems to be unfortunate because the models seem to do more than parroting. LLMs are more like central pattern generators of the nervous systems, able to flexibly create well coordinated patterns when guided appropriately

3 comments

Simulations of Brownian motion are not stochastic though, they are deterministic if you fix their random seed, right?
Stochasticty is mandatory for modeling brownian motion.

Actually transformers do not require ramndomness at all, so not at all

My understanding is the opposite. The entire process results in a "score" over all output tokens, which is then converted into a probability of being picked, using a softmax that takes a temperature as a parameter. With a temperature of zero, the "best" token is always picked, but interestingly enough, that does not give optimal results. So sometimes you want the second or even third best. Thus, a "good" (GPT-like) LLM is intrinsically random.

To put it differently: You can make them deterministic by using a temperature of zero (then the output would be pretty bad and repetitive), or having a "better" temperature and fixing a random seed (then the output would be better, but it would only be deterministic in the same sense as a simulation of Brownian motion with fixed random seed).

https://ai.stackexchange.com/questions/32477/what-is-the-tem...

Section 3.3 in https://www.lesswrong.com/posts/pHPmMGEMYefk9jLeh/llm-basics...

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

So the randomness is not mandatory for the llm to work, it is just boring. This means that as a language model it still performs perfectly well in modeling language. We just give it some random saltyness for fun

I would guess the random step is not even mandatory: there is probably a way to replace randomness with a simplified function and still get interesting text. I can't run a simulation but there is no indication here that good randomness is needed.

Fundamentally the design of the transformer and especially its core which is attention based, does not require randomness, so to call it a stochastic model is a stretch

All LLMs have some random aspects.

Training alone relies hugely on many factors (e.g. initialization of paramters, order of training data, hyper paramters, etc.).

In evaluation (afaik this applies to recent models as well) you pick the continuation based on chance and not always the "best". But evaluation is the result of the training process, so all the randomness from that factors in as well.

They are stochastic in the domain of meaning. Minor syntactic changes to the prompt or changes to the seed can result in substantial* changes to the meaning of the response.

*substantial as in nontrivial, not substantial as in massive

Isn't that rather "unstable" or "poorly conditioned" ?
Sure, I don't think those are mutually exclusive with stochastic. A stable or well-conditioned model may just have an acceptably small standard deviation for the task at hand.
Similar prompts give similar continuations, not wildly diverging, so no
By definition most continuations won't be wildly divergent under a stochastic model because its following a bell curve.
Stochasticty of meaning is not defined. I think it's an unfortunate use of the term

Humans are also like that

Difficulty to define rigorously does not preclude its existence or usefulness as model. The paper addresses how it feels humans are different from LLMs in reference to meaning.
Still sounds like a nonsense term. What would be non-stochasticity of meaning?