Hacker News new | ask | show | jobs
by elicksaur 671 days ago
> Like previous GPT models, the GPT-4 base model was trained to predict the next word in a document…

https://openai.com/index/gpt-4-research/

What humans do is materially different than that. When someone asks me a question, I don’t come up with an answer by thinking, “What’s the first word of my response going to be? The second word?…”

I understand that the AI marketing wants us to believe there’s more magic than that quote, but the actual technical descriptions of the models are what should be considered.

Also, skepticism =/= disappointment and swapping those out greatly changes what the sentence says about my feelings on the matter. Tech from OpenAI and friends can’t really disappoint me. I have no expectation that it won’t just be a money grab ;)

1 comments

> I don’t come up with an answer by thinking, “What’s the first word of my response going to be? The second word?…”

Actually, I'm not so sure that isn't exactly what we do. That's why it's called a "train of thought". You have a vague idea and you start talking and lo and behold out comes a pretty coherent encapsulation of your idea that is informed and bounded by the token relationships of your language.

Try answering a question with the order of your sentence reversed and you'll find it damn difficult. That answer of yours is not completely well formed just waiting for your mouth to get it all out. You're coming up with the answer one token at a time.

I usually try to think about what I’m going to say before I say it. My train of thought for this comment certainly did not start with “I usually”.
Like Heptapod B. The "next word" argument is pervasive and disappointingly ridiculous. If you present an LLM with a logic puzzle and it gives you the correct answer, how did it "predict the next word"? Yes, the output came in the form of additional tokens. But if those tokens required logical thought, it's a mistake to see the what as the how.
Maybe it’s pervasive because it’s literally the architecture of these models?