| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jacobr1 772 days ago
	One wrinkle, is that it is now common to fine-tune on previously derived RL datasets, with the tested inputs and preferred sample outputs as the training data.