Hacker News new | ask | show | jobs
by otabdeveloper4 23 hours ago
That's exactly what RLHF is for.

(In fact, "that colleague" might have even been the source of the RLHF training set.)