| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zaptrem 423 days ago
	I think when they were figuring out RLHF they avoided this by interleaving RLHF and normal cross entropy on training set gradients.