Hacker News new | ask | show | jobs
by ynniv 668 days ago
My disappointment comes from understanding that what humans do is keystroke prediction. If the output that I want can be solved by the most likely next keystroke, then sure, that’s a good use case. I’m perfectly capable of imagining those cases. People who are all in on humanity seem to not get this and go wild.

Don't mistake the "what" for the "how". What we ask LLMs to do is predict tokens. How they're any good at doing that is a more difficult question to answer, and how they are getting better at it, even with the same training data and model size, is even less clear. We don't program them, we have them train themselves. And there are a huge number of hidden variables that could be encoding things in weird ways.

These aren't n-gram models, and you're not going to make good predictions treating them as such.

1 comments

> Like previous GPT models, the GPT-4 base model was trained to predict the next word in a document…

https://openai.com/index/gpt-4-research/

What humans do is materially different than that. When someone asks me a question, I don’t come up with an answer by thinking, “What’s the first word of my response going to be? The second word?…”

I understand that the AI marketing wants us to believe there’s more magic than that quote, but the actual technical descriptions of the models are what should be considered.

Also, skepticism =/= disappointment and swapping those out greatly changes what the sentence says about my feelings on the matter. Tech from OpenAI and friends can’t really disappoint me. I have no expectation that it won’t just be a money grab ;)

> I don’t come up with an answer by thinking, “What’s the first word of my response going to be? The second word?…”

Actually, I'm not so sure that isn't exactly what we do. That's why it's called a "train of thought". You have a vague idea and you start talking and lo and behold out comes a pretty coherent encapsulation of your idea that is informed and bounded by the token relationships of your language.

Try answering a question with the order of your sentence reversed and you'll find it damn difficult. That answer of yours is not completely well formed just waiting for your mouth to get it all out. You're coming up with the answer one token at a time.

I usually try to think about what I’m going to say before I say it. My train of thought for this comment certainly did not start with “I usually”.
Like Heptapod B. The "next word" argument is pervasive and disappointingly ridiculous. If you present an LLM with a logic puzzle and it gives you the correct answer, how did it "predict the next word"? Yes, the output came in the form of additional tokens. But if those tokens required logical thought, it's a mistake to see the what as the how.
Maybe it’s pervasive because it’s literally the architecture of these models?