Hacker News new | ask | show | jobs
by totorovirus 844 days ago
proves my point that llms are simply a next token predictor. There are many interesting properties that we see "emergence" of intelligence but I think it's just human's incapability to hold so much knowledge on active memory.
4 comments

"Next token predictor" isn't quite the burn that it seems like, because perfect next token prediction would require actual understanding. That's because you can almost always cast any question about understanding into a form where it depends solely on the next token (there are a couple nitpicky exceptions and caveats but not many).

GPT 4 is at a high enough level of performance that mere simple statistics aren't really helping it do any better, it really is developing structures especially in the middle layers that perform some amount of high level understanding.

I don't think that pure next token prediction will always be the optimal way to train and enhance these behaviors, but it's not fair to say that it's unrelated, if this really was just stochastic parroting then LLMs would have topped out way before the level they're at now.

That's the thing. Although given the source of their knowledge is pure condensed wisdom, which is some sort of artificial intelligence, they lack the ability to "think", which is crucial to solve problems.
Mapping of language patterns in vector space is most definitely not "pure condensed wisdom"
Thank you for clarifying this fact. My comment was more about showing signs of intelligence. Maybe I oversimplified my statement too much.
LLMs literally are next token predictors, so I'm not understanding your broader point.
I think this has always been pretty obvious but the AI faithful have vested interested in insisting that LLM can actually think and solve problems.
More shocking are those that insists that the human brain must then also work by just guessing the next missing thing. As if the thought process behind I'm hungry starts with "I" and then trying to figure out what next best fits in... it's absurd.
The token would be the pure sensation of hunger, not the word for self, which is merely a convenient abstraction which we use to share knowledge between outside and over time.

LLMs don't have that sensation (why would they?), that doesn't mean can only be used for text: https://deepgram.com/learn/applications-of-transformer-model...

Jeez, I don't know how you would think that I thought that LLMs would have a sensation for hunger. That was not even close to my point at all.
> As if the thought process behind I'm hungry starts with "I"

That sounds like {you think that {people who think LLMs work like humans} believe that {the human sensation of hunger} is merely {saying the phrase "I am hungry"}}.