| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by HarHarVeryFunny 801 days ago
	Right, it's not doing anything between prompts, but each prompt is fed through each of the transformer layers (I think it was 96 layers for GPT-3) in turn, so we can think of this as a fixed N-steps of "thought" (analyzing prompt in hierarchical fashion) to generate each token.