| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by clementneo 1215 days ago

The main issue is that GPT is fundamentally an autoregressive language model — it's only predicting the next token based on the prompt at a single time. Every time it wants to predict the next word, it adds the previously predicted word into the prompt, repeating the cycle. We can intuitively guess that the model is 'working out a response that is eventually going to have "apple" in it', but we don't actually know how the model 'thinks' ahead about its response.

To rephrase that for this case: what is the specific mechanism in GPT-2 that (1) makes it realise that the word 'apple' is significant in this prompt, and (2) use that knowledge to push the model to predict 'an'? Finding this neuron would only answer the some portion of (2).

(And to rephrase this for the general case, which gives us the initial question: How does GPT-2 know when, given a suitable context, to predict 'an' over 'a'?)

1 comments

IgorPartola 1215 days ago

Do an improv exercise with a friend. Construct a story about a sentient apple one word at a time. I guarantee you that at one point one of you will set up the other with an “an” in your story. This is how I believe this works.

link

minkzilla 1215 days ago

Your friend is thinking ahead that the next word would be apple When they say ‘an’.

But GPT can’t think ahead what token it will add after the one it is on. Or can it? It could “predict” internally the word apple for the next “meaningful” word and output ‘an’ because of this.

link

IgorPartola 1215 days ago

Or it could keep outputting articles where appropriate until it outputs an “an” and then output apple having teed itself up.

link