Hacker News new | ask | show | jobs
by seppel 1212 days ago
I don't think you are missing something. I think this whole "GPT-2 is predicting only one word at a time" is a red herring anyway.

Of course it can only answer the next word, because there is only room in its outputs for the next word. But it has to compute much more. It a huge hidden internal state where it has to first encode what the given sentence is about, then predict some general concept in which the continuation goes, decide the locally correct syntactical structure and only from this you can predict the next word.