Y
Hacker News
new
|
ask
|
show
|
jobs
by
Y_Y
1208 days ago
That not what RLHF is. In the thunderdome, as in chess, you don't need human judges or an oracle to know who's won. That makes a significant difference to the training procedure.