| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jdietrich 858 days ago
	If you train the model purely based on win rate, sure. Fortunately, we can efficiently use RLHF to train a model to play in a human-like way and give entertaining matches.