| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hackinthebochs 1218 days ago
	It's easy to see that its not just doing one token at a time but is anticipating future tokens. Consider the context of a Q&A. The response might start with any of a number of words, exactly which word depends on what comes after. But if it randomly chooses the wrong word, it will either be forced to complete the wrong answer, or be backed into a corner and engage in circumlocutions to course-correct. This doesn't happen in practice for recent big models.