Hacker News new | ask | show | jobs
by mk_stjames 1291 days ago
I think what you are saying is just not true in the sense GPT style LLMs. The output is not just single word generation at a time. It is indeed taking into account the entire structure, preceding structures, and to a certain extent abstractions inherent to the structure throughout the model. Just because it tokenizes input doesn't mean it is seeing things word by word or outputting word by word. Transformers are not just fancy LSTMs. The whole point of transformers is it takes the input in parallel, where RNNs are sequential.
1 comments

It seems I'd gotten the wrong impression of how it works. Do you have any recommendations for primers on GPT and similar systems? Most content seems to be either surface level or technical and opaque.
No. You got the right impression. It is indeed doing "next token prediction" in an autoregressive way, over and over again.

The best source would be the GPT-3 paper itself: https://paperswithcode.com/method/gpt-3