Hacker News new | ask | show | jobs
by jdietrich 858 days ago
If you train the model purely based on win rate, sure. Fortunately, we can efficiently use RLHF to train a model to play in a human-like way and give entertaining matches.