| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by underwater 1337 days ago

That joke is a great example of why the creativity is surprising.

A human might have a thought process that starts with the idea that people are food for Bigfoot, and then connects that to phrase of "playing with your food".

But GPT generates responses word by word. And it operates at a word (token) level, rather than thinking about the concepts abstractly. So it starts with "Because it likes to play" which is a predictable continuation that could end in many different ways. But it then delivers the punchline of "with its food".

Was it just a lucky coincidence that it found an ending to the sentence that paid off so well? Or is the model so sophisticated that it can suggest word "plays" because it can predict the punchline related to "food".

1 comments

mk_stjames 1337 days ago

I think what you are saying is just not true in the sense GPT style LLMs. The output is not just single word generation at a time. It is indeed taking into account the entire structure, preceding structures, and to a certain extent abstractions inherent to the structure throughout the model. Just because it tokenizes input doesn't mean it is seeing things word by word or outputting word by word. Transformers are not just fancy LSTMs. The whole point of transformers is it takes the input in parallel, where RNNs are sequential.

link

underwater 1337 days ago

It seems I'd gotten the wrong impression of how it works. Do you have any recommendations for primers on GPT and similar systems? Most content seems to be either surface level or technical and opaque.

link

fjkdlsjflkds 1337 days ago

No. You got the right impression. It is indeed doing "next token prediction" in an autoregressive way, over and over again.

The best source would be the GPT-3 paper itself: https://paperswithcode.com/method/gpt-3

link