Y
Hacker News
new
|
ask
|
show
|
jobs
by
deevolution
1071 days ago
Aren't they using RLHF? The feedback from humans might not always be the ~right~ feedback. Couldn't that possibly degrade the quality of its responses?