Y
Hacker News
new
|
ask
|
show
|
jobs
by
jdietrich
858 days ago
If you train the model purely based on win rate, sure. Fortunately, we can efficiently use RLHF to train a model to play in a human-like way and give entertaining matches.