| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nuancebydefault 593 days ago

To me it feels that whatever 'proof' you give that LLMs have a model in behind, other than 'next token prediction', it would not make a difference for people not 'believing' that. I see this happening over and over on HN.

We don't know how reasoning emerges in humans. I'm pretty sure the multi-model-ness helps, but it is not needed for reasoning, because they imply other forms of input, hence just more (be it somewhat different) input. A blind person can still form an 'image'.

In the same sense, we don't know how reasoning emerges in LLMs. For me the evidence lays in the results, rather than in how it works. For me the results are enough of an evidence.

2 comments

cjbprime 592 days ago

The argument isn't that there is something more than next token prediction happening.

The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.

link

unoti 592 days ago

> The argument isn't that there is something more than next token prediction happening.

> The argument is that next token prediction does not imply an upper bound on intelligence, because an improved next token prediction will pull increasingly more of the world that is described in the training data into itself.

Well said! There's a philosophical rift appearing in the tech community over this issue semi-neatly dividing people between naysayers, "disbelievers" and believers over this very issue.

link

nuancebydefault 592 days ago

I fully agree. Some people fully disagree though on the 'pull of the world' part, let alone 'intelligence' part, which are in fact impossible to define.

link

corimaith 593 days ago

The reasoning emerges from the long distance relations between words picked up by the parallel nature of the transformers. It's why they were so much more performant than earlier RNNs and LSTMs which were using similar tokenization.

link