Hacker News new | ask | show | jobs
by HarHarVeryFunny 754 days ago
Right, it's not doing anything between prompts, but each prompt is fed through each of the transformer layers (I think it was 96 layers for GPT-3) in turn, so we can think of this as a fixed N-steps of "thought" (analyzing prompt in hierarchical fashion) to generate each token.