| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Bayano2 57 days ago
	This was true circa GPT2, less true after RLHF and not true at all after RLVR. It's trying to model the distribution of outputs most likely to solve the problem, not the average distribution.