Hacker News new | ask | show | jobs
by deevolution 1071 days ago
Aren't they using RLHF? The feedback from humans might not always be the ~right~ feedback. Couldn't that possibly degrade the quality of its responses?