Hacker News new | ask | show | jobs
by karfly 1249 days ago
IMO all RLHF stuff is mainly about aligning model not to reply with offensive and inappropriate answers, but NOT about making model's answers better.
1 comments

Ah, that makes sense. Thanks!