Hacker News new | ask | show | jobs
by underwater 1291 days ago
That joke is a great example of why the creativity is surprising.

A human might have a thought process that starts with the idea that people are food for Bigfoot, and then connects that to phrase of "playing with your food".

But GPT generates responses word by word. And it operates at a word (token) level, rather than thinking about the concepts abstractly. So it starts with "Because it likes to play" which is a predictable continuation that could end in many different ways. But it then delivers the punchline of "with its food".

Was it just a lucky coincidence that it found an ending to the sentence that paid off so well? Or is the model so sophisticated that it can suggest word "plays" because it can predict the punchline related to "food".

1 comments

I think what you are saying is just not true in the sense GPT style LLMs. The output is not just single word generation at a time. It is indeed taking into account the entire structure, preceding structures, and to a certain extent abstractions inherent to the structure throughout the model. Just because it tokenizes input doesn't mean it is seeing things word by word or outputting word by word. Transformers are not just fancy LSTMs. The whole point of transformers is it takes the input in parallel, where RNNs are sequential.
It seems I'd gotten the wrong impression of how it works. Do you have any recommendations for primers on GPT and similar systems? Most content seems to be either surface level or technical and opaque.
No. You got the right impression. It is indeed doing "next token prediction" in an autoregressive way, over and over again.

The best source would be the GPT-3 paper itself: https://paperswithcode.com/method/gpt-3