| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lostmsu 1014 days ago
	I had a conversation with a friend regarding this exact question and my understanding is that model trains to optimize the distribution of all texts, therefore when you restrict it to deterministic sampling that is not representative of inputs you select the slice of the distribution that model learned that conveys much less information than the full distribution, and hence has poorer results.