| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by unblough 1019 days ago

> Weird thing is it was designed to model language. It’s surprising that it returns sound answers as often as it does.

Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

> “surprising”, i.e. we don’t really know what happened.

This ie reads like a sort of popsci conclusion.

We know exactly what happened. We programmed it to perform these calculations. It’s actually rather straightforward elementary mathematics.

But, what happens is so many interdependent calculations grow the complexity of the problem until we are unable to hold it in it our minds, and to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights.

As for its effectiveness, familiarity with the field of computational complexity points to high dimensional polynomial optimization problems being broadly universal solvers.

2 comments

TerrifiedMouse 1018 days ago

> Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority. Early version, GPT1/2, all return mostly complete nonsense. It was only with GPT3 when the model got large enough that it started returning results that are convincing and might even make sense often enough.

Even more mind boggling is the fact that randomness is part of its algorithm, i.e. temperature, and that without it the output is kind of meh.

link

unblough 1018 days ago

> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you took the same amount of data for the GPT3+ but scrambled it's tokenization before training THEN I would agree with you that its current behaviour is surprising, but the model was fed data that has large swaths that are literal question and answer constructions. It's over fitting behavior is largely why it's parent company is facing so much legal backlash.

> Even more mind boggling is the fact that randomness is part of its algorithm

The randomness is for token choice rather than any training time tunable so fails to support the "i.e. we don’t really know what happened" sentiment. We do know, we told it to flip a coin, and it did.

> i.e. temperature, and that without it the output is kind of meh.

Both without it and with it. You can turn up the temperature and get bad results as well as you can turn it down and get bad results.

If adding a single additional dimension to the polynomial of the solution space turned a nondeterministic problem into a deterministic one, then yes, I would agree with you, that would be surprising.

link

TerrifiedMouse 1018 days ago

> so fails to support the "i.e. we don’t really know what happened" sentiment

It's less that we don't know what's happening on a micro-level but more that it's surprising that it's producing anything coherent at all on a macro-level - especially with a (necessary) element of randomness in the process.

For most part we don't seem particularly knowledgeable about what happens on a macro-level. Hallucinations remain an unsolved problem. AI companies can't even make their "guardrails" bulletproof.

link

blovescoffee 1018 days ago

If you believe LLMs are fully explainable you should write a paper and submit to arxiv.

link

unblough 1018 days ago

I think this is an uncharitable reading of this thread.

I’m arguing against the breathless use of “surprising”.

My gp explains what I think you overlooked in this dismissive response.

> to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights.

Explainable but intractable is still far from surprising for me.

link

blovescoffee 1018 days ago

> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you read through what Hinton or any of his famous students have said, it genuinely was and is surprising. Everything from AlexNet to the jump between GPT-2 to GPT-3 was surprising. We can't actually explain that jump in a formal way, just reasonable guesses. If something is unexplainable, it's unpredictable. Prediction without understanding is a vague guess and the results will come as a surprise.

link

famouswaffles 1018 days ago

>Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

Lol researchers were surprised by the mostly incoherent nonsense pre-transformer RNNs were spouting years go, nevermind the near perfect coherency of later GPT models. To argue otherwise is just plain revisionism.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

link

unblough 1018 days ago

From your linked post:

> What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I’ve in fact reached the opposite conclusion). Fast forward about a year: I’m training RNNs all the time and I’ve witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me.

This reads more like humanizing the language of the post then any legitimate surprise from the author.

The rest of the post then goes into great detail showing that “we DO really know what happened” to paraphrase the definition the op provides for their use of “surprise”.

> Conclusion We’ve learned about RNNs, how they work, why they have become a big deal, we’ve trained an RNN character-level language model on several fun datasets, and we’ve seen where RNNs are going.

I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

> In fact, it is known that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights).

Mathematically proven to be able to do something is about as far from surprise as one can get.

link

famouswaffles 1018 days ago

>This reads more like humanizing the language of the post then any legitimate surprise from the author.

Lol Sure

>I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

We don't know what the models learn and what they employ to aid in predictions. That is fact. Going on a grad descent rant is funny but ultimately meaninglessness. It doesn't tell you anything about the meaning of the computations.

There is no misplaced incomprehensibility because the internals and how they meaningfully shape predictions is incomprehensible.

>Mathematically proven to be able to do something is about as far from surprise as one can get.

Magic the gathering is turing complete. I'm sorry but "therotically turing complete" is about as meaningless as it gets. Transformers aren't even turing complete.

link