Hacker News new | ask | show | jobs
by Y_Y 1208 days ago
That not what RLHF is. In the thunderdome, as in chess, you don't need human judges or an oracle to know who's won. That makes a significant difference to the training procedure.