| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by seydor 1107 days ago
	LLMs are not stochastic though, they are deterministic and dont even require random numbers, right? The term in general seems to be unfortunate because the models seem to do more than parroting. LLMs are more like central pattern generators of the nervous systems, able to flexibly create well coordinated patterns when guided appropriately

3 comments

dudebro314 1107 days ago

Simulations of Brownian motion are not stochastic though, they are deterministic if you fix their random seed, right?

link

seydor 1107 days ago

Stochasticty is mandatory for modeling brownian motion.

Actually transformers do not require ramndomness at all, so not at all

link

FabHK 1107 days ago

My understanding is the opposite. The entire process results in a "score" over all output tokens, which is then converted into a probability of being picked, using a softmax that takes a temperature as a parameter. With a temperature of zero, the "best" token is always picked, but interestingly enough, that does not give optimal results. So sometimes you want the second or even third best. Thus, a "good" (GPT-like) LLM is intrinsically random.

To put it differently: You can make them deterministic by using a temperature of zero (then the output would be pretty bad and repetitive), or having a "better" temperature and fixing a random seed (then the output would be better, but it would only be deterministic in the same sense as a simulation of Brownian motion with fixed random seed).

https://ai.stackexchange.com/questions/32477/what-is-the-tem...

Section 3.3 in https://www.lesswrong.com/posts/pHPmMGEMYefk9jLeh/llm-basics...

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

link

seydor 1107 days ago

So the randomness is not mandatory for the llm to work, it is just boring. This means that as a language model it still performs perfectly well in modeling language. We just give it some random saltyness for fun

I would guess the random step is not even mandatory: there is probably a way to replace randomness with a simplified function and still get interesting text. I can't run a simulation but there is no indication here that good randomness is needed.

Fundamentally the design of the transformer and especially its core which is attention based, does not require randomness, so to call it a stochastic model is a stretch

link

constantcrying 1107 days ago

All LLMs have some random aspects.

Training alone relies hugely on many factors (e.g. initialization of paramters, order of training data, hyper paramters, etc.).

In evaluation (afaik this applies to recent models as well) you pick the continuation based on chance and not always the "best". But evaluation is the result of the training process, so all the randomness from that factors in as well.

link

enragedcacti 1107 days ago

They are stochastic in the domain of meaning. Minor syntactic changes to the prompt or changes to the seed can result in substantial* changes to the meaning of the response.

*substantial as in nontrivial, not substantial as in massive

link

8note 1107 days ago

Isn't that rather "unstable" or "poorly conditioned" ?

link

enragedcacti 1107 days ago

Sure, I don't think those are mutually exclusive with stochastic. A stable or well-conditioned model may just have an acceptably small standard deviation for the task at hand.

link

seydor 1107 days ago

Similar prompts give similar continuations, not wildly diverging, so no

link

enragedcacti 1107 days ago

By definition most continuations won't be wildly divergent under a stochastic model because its following a bell curve.

link

seydor 1107 days ago

Stochasticty of meaning is not defined. I think it's an unfortunate use of the term

Humans are also like that

link

enragedcacti 1107 days ago

Difficulty to define rigorously does not preclude its existence or usefulness as model. The paper addresses how it feels humans are different from LLMs in reference to meaning.

link

seydor 1107 days ago

Still sounds like a nonsense term. What would be non-stochasticity of meaning?

link